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REMARKS 

Claims 1-79 were pending in the application. Claims 23, 24, 30-66, and 73-79 
were withdrawn from consideration as directed to non-elected inventions. 

Claims 1, 3, 12, 25-27, and 67 have been amended. New claim 80 has been 
added. Support for the amendments can be found throughout the application as originally 
filed. 

Claim 22 has been canceled without prejudice to its presentation in future, related 
applications. 

The title has been replaced. 

Upon entry of this amendment claims 1-21, 25-29, 67-72, and 80 will be pending. 
No new matter has been added. 

Information Disclosure Statement 

The Office alleges that the references listed on the PTO-1449, (filed September 
18, 2001 were not present in the current application file. Copies of the PTO-1449 filed 
on September 18, 2001, and the references cited therein that were apparently misplaced 
by the USPTO will be sent under separate cover. Applicant notes that the Examiner will 
"consider them as though they were submitted with the IDS in paper No. 4" (Office 
Action, page 2). 

Priority 

The Office alleges that the present claims are not supported in the manner 
required by 35 U.S.C. § 101 and 112, first paragraph, by the priority application and, 
therefore, that the present claims are not entitled to the benefit of the filing date of the 
priority application. The Office alleges that the priority application fails to provide any 
specific, substantial and credible utility and provides no guidance or working examples to 
teach how to use the claimed invention. Applicant respectfully disagrees. 

The crux of the Office's rejection of the priority claim is similar to the rejections 
set forth in the present application, in that the pending claims allegedly lack utility and 
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are not enabled. However, as discussed below, the pending claims have utility and 
enable a person of skill in the art to make and/or use the claimed invention. Since the 
prior application's disclosure is similar to the present application (see, for example, pages 
34-48 of Provisional Serial No. 60/225,262), when the pending claims are found to have 
utility and be enabled, the prior application must also satisfy the requirements under 35 
U.S.C. § 101 and 1 12, first paragraph. Therefore, Applicant respectfully requests that the 
effective filing date of the present application be recognized as the filing date of the 
priority application, August 15, 2000. 

Title 

The Office has objected to the title as not being descriptive. Although Applicant 
disagrees, in order to further prosecution, Applicant has replaced the title with an even 
more descriptive title. 

Objections 

The specification stands objected to as allegedly failing to provide proper 
antecedent basis for the claimed subject matter. Specifically the Office alleges that it is 
unable to find basis in the specification for the limitation in claim 72 that "a host cell 
according to claim 71 that has been co-transfected with a polypeptide [sic]." (emphasis 
in original. Office Action, page 3). Applicant respectfully disagrees. 

As an initial matter, it is unclear what part of claim 72 the Office objects to. It 
appears that the Office objects to the use of the term "co-transfect" since the Office 
italicized the term. If this is incorrect, however, Applicant respectfully requests that the 
Office further clarify this objection. 

As an initial matter. Applicant respectfully points out that the claim recited in the 

Office Action is not an accurate quotation of claim 72 as filed. Claim 72 as filed recites: 

A host cell according to claim 71 that has been co- 
transfected with a polynucleotide encoding the nOPCR- 
1079 amino acid sequence set forth in a sequence of SEQ 
ID N0:1 and that expresses the nGPCR-1079 having the 
amino acid sequence set forth in SEQ ID N0:2. 
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(emphasis added). Therefore, if the Office was objecting to the recitation of "co- 
transfected with a polypeptide,'' (emphasis added) Applicant respectfully requests that 
this objection be withdrawn in view of the correct quotation of claim 72. 

However, if the objection to claim 72 is based on the term "co-transfected" 
Applicant respectfully disagrees that there is no "basis" in the specification for the term. 
Claim 72, as set forth above, was filed "as is" with the present application. Therefore, 
even if there were no explicit mention of the term "co-transfected" in the remainder of 
the specification, the disclosure in the claim itself would serve as written description 
support as support for the claim language can be found in the claim itself (see, M.P.E.P. 
§2163.02.) Further, the term "co-transfected" is well known to one of ordinary skill in 
the art and, in reference to claim 72, refers to at least two nucleotide sequences being 
transfected together into a host cell. The term "co-transfected" also appears in the 
specification and is therefore clearly supported by the present application, (see, for 
example, page 78, paragraph [00282], and page 79, paragraph [00286]). 

In view of the foregoing. Applicant respectfully requests that the objection to 
claim 72 be withdrawn. 

Claim 67 stands objected to for being dependent upon a non-elected claim. Claim 
67 has been amended so that is no longer depends on a non-elected claim, rendering this 
objection moot. In view of the foregoing, Applicant respectfully requests that the 
objection to claim 67 be withdrawn. 

Rejection under 35 U.S.C. § 101 

Claims 1-22, 25-29, and 67-72 stand rejected under 35 U.S.C. § 101 because the 
claimed invention is allegedly not supported by a specific, substantial and credible 
asserted utility or a well established utility. The Office also alleges that the asserted 
utilities are "not considered specific, or substantial because the specification fails to 
provide specific support for these uses, nor any information about the ligand, a particular 
function, or biological significance of the polypeptide encoded by the nucleic acid." 
(Office Action, page 4). Applicant respectfully disagrees. 
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The Claimed Invention Has A Specific Utility 

To meet the utility requirement, the invention must be "practically useful," 
Anderson v Natta, 480 F.2d 1392, 1397 (CCPA 1973) and confer a "specific benefit" on 
the public. Brenner v. Manson, 383 U.S. 519, 534 (1966). The threshold of utility under 
this standard is not high, and requires merely an "identifiable" benefit. Juicy Whip Inc.v, 
Orange Bang Inc., 51 USPQ2d 1700 (Fed. Cir. 1999). In Stiftung v. Renishaw PLC, 945 
F.2d 1 173, 1 180 (Fed. Cir. 1991), the CAFC explained that "An invention need not be the 
best or only way to accomplish a certain result, and it need only be useful to some extent 
and in certain applications: "[T]he fact that an invention has only limited utility and is 
only operable in certain applications is not grounds for finding lack of utility." Envirotech 
Corp. V. Al George, Inc., 730 F.2d 753, 762, 221 USPQ 473, 480 (Fed. Cir. 1984). 

Inventions that achieve a practical use, a use that is also achieved by other 
inventions, satisfy the utility requirement. Thus practical utilities can be directed to 
classes of inventions, so long as a person of ordinary skill in the art would understand 
how to achieve a practical benefit from knowledge of the class. Montedison, 664 F.2d at 
374-75. For example, many materials conduct electricity. This general utility applies to 
a broad class of inventions (conductive materials) and satisfies the utility requirement of 
section 101. The fact that other materials also conduct electricity does not mean that 
other materials that conduct electricity want for utility. What is important, however, is 
that G protein-coupled receptors (GPCRs) are known to have practical uses well beyond 
throwaway uses like snake food. 

Practical uses for GPCRs include therapeutic and diagnostic uses as well as 
research-based uses. Many medically significant biological processes are mediated by 
signal transduction pathways involving G-proteins and other second messengers, and 
GPCRs are recognized as important therapeutic targets for a wide range of diseases. 
According to a recently issued United States patent, nearly 350 therapeutic agents 
targeting GPCRs have been successfully introduced onto the market in only the last 
fifteen years. (See U.S. Patent No. 6,114,127, at col. 2, lines 45-50.) A recent journal 
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review reported that most GPCR ligands are small and can be mimicked or blocked with 
synthetic analogues. That, together with the knowledge that numerous GPCRs are targets 
of important drugs in use today, make identification of GPCRs "a task of prime 
importance." (See, Marchese et al., Trends Pharmacol. Sci., 20(9): 370-5, 1999, attached 
hereto). Thus, the allegation that there is no well established utility for proteins of the 
class that the Applicant is now claiming is directly refuted by industry evidence. 

The Office appears to be under the impression that inventions that are, inter alia, 
useful for use in research, are unpatentable. This is not true. The Patent Office's patent 
database is replete with patents claiming useful research tools, e.g., spectrophotometers. 
A material whose only use is as a tool in research may indeed be patentable. Brenner 
excludes only those research purposes where the only use of the material itself is as the 
subject of research. If Brenner had held otherwise, any chemical material would, by 
virtue of its existence, be useful. However, nowhere do those cases state or imply that a 
material cannot be patentable if has some other beneficial use in research. 

Assay methods, like many other tools used in research, have an immediately 
realizable "real world" value. For example, an assay method that can identify chemical 
compounds that possess a particular physical, structural or biological property clearly has 
"real world" value irrespective and independent from the utility that may be associated 
with the compounds identified using the assay method. As a consequence, a presumption 
that assay methods cannot possess utility if the compound isolated or identified using the 
assay do not have utility would be the product of a flawed analysis of Brenner. Such a 
conclusion also would suggest that processes and products can never possess utility if 
their utility lies in the field of research. Indeed, the application of this concept of the 
utility requirement as it relates to methods for assaying or identifying compounds, if 
taken literally, would mean that claims to methods such as NMR, infrared, x-ray 
crystallography, and screening for other important biological properties, would be 
unpatentable because further research would be necessary to establish utility for the 
compounds identified or assayed. This certainly cannot be the result intended by the 
Patent Office when issuing the Utility Examination Guidelines. 
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Genes encoding GPCRs can also be used, for example, for toxicology testing to 
generate information useful in activities such as drug development, even in cases where 
little is known as to how a particular GPCR works. No additional experimentation would 
be required, therefore, to determine whether a GPCR has a practical use as all GPCRs 
have at least one practical use. 

Because all GPCRs, as a class, convey practical benefit (much like the class of 
DNA ligases identified in the Training Materials), there should be no need to provide 
additional information about them. A person of ordinary skill in the art need not guess 
whether any given GPCR conveys a practical benefit. Nor is it necessary to know how or 
why any given GPCR works. It is settled law that how or why any invention works is 
irrelevant to determining utility under 35 U.S.C. §101: "[I]t is not a requirement of 
patentability that an inventor correctly set forth, or even know, how or why the invention 
works." In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999)(quoting Newman v. 
Quigg, 877 F.2d 1575, 1581 (Fed. Cir. 1989). 

Applicant need only prove a "substantial likelihood" of utility; certainty is not 
required. Brenner, 383 U.S. at 532. The amount of evidence required to prove utility 
depends on the facts of each particular case. In re Jolles, 628 F.2d 1322, 1326 (CCPA 
1980). "The character and amoimt of evidence may vary, depending on whether the 
alleged utility appears to accord with or to contravene established scientific principles 
and beliefs." Id, Unless there is proof of "total incapacity," or there is a "complete 
absence of data" to support the applicant's assertion of utility, the utility requirement is 
met. Brooktree Corp. v. Advanced Micro Devices, Inc., 977 F.2d 1555, 1571 (Fed. Cir. 
1992); Envirotech, 730 F.2d at 762. The Office has failed to provide proof of "total 
incapacity", and Applicant has provided information that supports the asserted utilities. 

The Office is also reminded that a patent applicant's assertion of utility in the 
disclosure is presumed to be true and correct. In re Cortwright, 165 F.3d at 1356; Brana, 
51 F.3d at 1566. If such an assertion is made, the Patent Office bears the burden to 
demonstrate that a person of ordinary skill in the art would reasonably doubt that the 
asserted utility could be achieved. Id. To do so, the PTO must provide evidence or sound 
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scientific reasoning. See In re Longer, 503 F.2d 1380, 1391-92 (CCPA 1974). If and only 
if the Patent Office makes such a showing, the burden shifts to the applicant to provide 
rebuttal evidence that would convince the person of ordinary skill that there is sufficient 
proof of utility. Brana, 51 F.3d at 1566. 

Applicant has demonstrated a "substantial likelihood" of utility by showing a 
"reasonable correlation" between the utility of the known composition and the 
composition being claimed, Fujikawa v. Wattanasin, 93 F.3d 1559, 1565 (Fed. Cir. 
1996). The presently claimed GPCR is related to known GPCRs. The Office has not 
provided evidence or sound scientific reasoning that one skilled in the art would doubt 
the "reasonable correlation" advanced by Applicant. 

The present application recites at, for example, pages 36-47 of the specification 
that the claimed invention can be used, inter alia, to identify ligands, protein binding 
partners, and/or modulators. Additionally, the polynucleotides of the present invention 
can be used to generate antibodies useful to localize proteins encoded by the 
polynucleotides of the present invention in vivo or in vitro. The polynucleotides can also 
be used to determine the expression pattern of the gene in various tissues which would 
enable a person of ordinary skill in the art to better understand the function and role of 
the gene in vivo. Thus, there is no question that Applicant has asserted at least one 
specific utility and, in fact, have provided numerous specific utilities for the 
polynucleotides of the present invention. Accordingly, under Brana, the Patent Office 
must accept the utility asserted by Applicant. 

Additionally, the Office appears to be under the assumption that absolute 
certainty is required for a polynucleotide to have a specific utility. The standard 
applicable in this case is not, however, proof to certainty, but rather proof to reasonable 
probability. As the Supreme Court stated, applicant need only prove a "substantial 
likelihood" of utility; certainty is not required. Brenner v. Manson, 383 U.S. at 532. 
Although, there may be numerous inventions that may arise from the present application, 
this standard does not justify the Office's stance that the present invention lacks a specific 
utility. Thus, Applicant has complied with the specific utility requirement. 
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The claimed invention in Bremer was directed to a method whose only utility 

was making a class of steroids. The disclosure in Brenner failed to disclose a utility for 

the products of that method, which in turn led to a § 101 rejection because the products 

resulting from the method lacked utility. The Applicant admitted that the products 

produced by the method would not be patentable if they lacked utility. 148 USPQ 696. 

The Court stated that the method lacked utility as well, holding: 

We find absolutely no warrant for the proposition that although Congress 
intended that no patent be granted on a chemical compound whose sole 
"utility" consists of its potential role as an object of use-testing, a different set 
of rules was meant to apply to the process which yielded the unpatentable 
product. 

148 USPQ 696. 

In Brenner, the method of making the compounds, which was the only use 
recited, was inextricably bound up with the compounds themselves and, as a result, the 
requirement for utility could not be met until a use for the compounds was found. The 
Court emphasized that the utility of the claimed invention (i.e., the products) would 
require further research to identify and ascertain, and the compounds produced by the 
method would be the object of that research. 

In contrast, GPCRs related to known GPCRs stand on a very different basis. As 
discussed, there are a multitude of utilities for the claimed polypeptides, including their 
ability to facilitate research. 

Applicant furthers assert that long held pre-Brenner case law standard supports 
judging the utility of an invention on whether or not the public derives a benefit from the 
invention, regardless of how slight the benefit. See, for example. In re Nelson, 280 F.2d 
172, 178-180 (C.C.P.A. 1960) (stating that "however slight the advantage which the 
public have received from the inventor, it offers a sufficient reason for his 
compensation") (citing ROBINSON ON PATENTS (1890)); see also Lowell v, Lewis, 1 
Mason 182 (Fed. Case. No. 8568, 1817) (stating "if it be more or less useful is... of no 
importance to the public. If it be not extensively useful it will silently sink into contempt 
and disregard"). Polypeptides of all types are broadly used in the biotechnology industry, 
playing key roles in drug and disease discovery processes. Indeed, many such 
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polypeptides enable researchers to find the genes associated with physiological functions. 
The discovery of such functions readily benefits the public. Accordingly, such tools 
satisfy the pre-Brenner case law standard. 

The Claimed Invention Has A Substantial Utility 

The Utility Examination Guidelines also require a claimed invention to have a 
utility that defines a real-world use (a "substantial utility"). Applicant teaches, as 
described above, that the claimed invention can be used to make antibodies, identify 
ligands and other binding partners, such as other proteins that interact with the 
polypeptide (i.e., a G protein). Thus, it is clear that the claimed invention has real-world 
uses. All the uses described in the present application are real-world uses and, again, 
stand in stark contrast to the "throw away" uses (e.g., landfill component or snake food) 
set forth in the utility guidelines. Thus, there is no question that Applicant has asserted at 
least one substantial utility and, in fact, have provided numerous substantial utilities. 
Accordingly, Applicant has complied with the substantial utility requirement. 

The Claimed Invention Has A Credible Utility 

In addition to a specific and substantial utility, the Utility Examination Guidelines 
require that such utility be credible (a "credible utility"). That is, whether the assertion of 
utility is believable to a person of ordinary skill in the art based on the totality of 
evidence and reasoning provided. Clearly, the niunerous specific and substantial utilities 
asserted by Applicant are credible. 

Assertions of credibility are credible unless "(A) the logic underlying the 
assertion is seriously flawed, or (B) the facts upon which the assertion is based is 
inconsistent with the logic underlying the assertion." (See, Revised Interim Utility 
Guidelines Training Materials.) All the utilities described for the polynucleotide and 
polypeptide are based on sound logic. Furthermore, the utilities for the claimed 
polynucleotide are not inconsistent with the logic underlying the assertion that the 
polynucleotide are useful. Polynucleotides are useful to encode and produce 
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polypeptides to generate antibodies, identify ligands or protein partners, evaluate 
expression patterns, evaluate protein activity, etc. The Office has provided no evidence 
that the logic is seriously flawed or that the facts upon which these assertions are based 
are inconsistent with the logic underlying the assertions. 

In this respect, the G protein coupled receptor family is analogous to the chemical 
genus that was the subject of In re Folkers, 145 USPQ 390 (CCPA 1965) (Compound 
that belongs to class of compounds, members of which are recognized as useful, is 
considered useful under §101.) The Patent Office does not serve the public by attempting 
to substitute a formulaic analysis of § 101 for the established judgment of the 
biopharmaceutical industry as to what is "useful." If the Patent Office is aware of any 
well-grounded scientific literature suggesting that GPCR's are not useful. Applicant 
requests that it be made of record. 

Art-Recognized Utility 

The Utility requirement may also be satisfied by an "Art Established Utility" 
which means that "a person of ordinary skill in the art would immediately appreciate why 
the invention is useful based on the characteristics of the invention. . . and the utility is 
specific, substantial and credible." (M.P.E.P. §2107). 

Applicant points out that commercial products relating to GPCRs for which no 
confirmed function has been identified are commercially available. GPCRs, ORF clones 
of GPCRs, and antibodies that bind to GPCRs are commercially available. For example. 
Applicant points out that FabGennix Inc. of Shreveport, Louisiana sells an antibody 
directed to Retinal Anti-GP75. GPCR75 is said to be a GPCR for which a ligand has not 
yet been identified {see attached product sheet). Invitrogen sells ORF clones of GPCRs 
including those for which a ligand has not yet been identified {see attached list, especially 
noting Clone Ids IOH22483, IOH14039, IOH13056, IOH22637, IOH13239, and 
IOH13516). MD Bio of Taiwan sells GPCR peptides and antibodies against such 
peptides, again where no ligand has yet been identified. That at least three companies 
make and sell such GPCR products proves that there is a well-established utility for the 
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presently claimed GPCR polypeptides. Accordingly there could be no better proof of the 

utilities of the claimed polypeptides- such products are made by a manufacturer (who 

expects to sell them) for consumers (who expect to buy them). Any argument that there 

is no art-recognized utility for such polypeptides seems meritless. 

Applicant also notes for the record that the Patent Office apparently agrees with 

Applicant's reasoning that GPCRs are useful in that the Office has granted and 

apparently continues to grant patents to G-protein coupled receptors, their encoding 

polynucleotides and antibodies directed to them in which no natural substrate or 

specific biological significance is ascribed to the GPCR. Specifically, Applicant would 

like to bring the following US Patents to the Office's attention: 

6,518,414 Maclennan "Molecular Cloning and Expression of G-Protein Coupled 
Receptors" (Claims an isolated polynucleotide) 

6,511,826 Li et al. "Polynucleotides Encoding Human G-Protein Chemokine Receptor 
(CCR5) HDGNRIO" (Claims an isolated polynucleotide encoding a protein identified as 
a "chemokine receptor" with no specific chemokine identified) 
6,372,891 Soppet et al. "Human G-Protein Receptor HPRAJ70" (Claims an antibody 
directed to a G-protein coupled receptor) 

6,361,967 Agarwal et al. "AXORIO, A G-Protein Coupled Receptor" (Claims an isolated 
polynucleotide) 

6.348.574 Godiska et al. "Seven Transmembrane Receptors" (Claims an antibody 
directed to a G-protein coupled receptor) 

6,114,139 Hinuma et al. "G-Protein Coupled Receptor Protein and A DNA Encoding the 
Receptor" (Claims an isolated polynucleotide). 

6,111,076 Fukusumi et al. "Human G-Protein Coupled Receptor (HIBCD07)" (Claims 
isolated polypeptide) 

6,107,475 Godiska et al. "Seven Transmembrane Receptors" (Claims isolated 
polynucleotide and methods) 

6,096,868 Halsey et al. "ECR 673: A 7-Transmembrane G-Protein Coupled Receptor" 
(Claims isolated polypeptide) 

6.090.575 Li et al. "Polynucleotides Encoding Human G-Protein Coupled Receptor 
GPRl" (Claims isolated polynucleotide) 

6,071,722 Elshourbagy et al. "Nucleic Acids Encoding A G-Protein Coupled 7TM 
Receptor (AXOR-1)" (Claims an isolated polynucleotide) 

6,071,719 Halsey et al. "DNA Encoding ECR 673: A 7-Transmembrane G-Protein 

Coupled Receptor" (Claims an isolated polynucleotide) 

6,060,272 Li et al. "Human G-Protein Coupled Receptors" (Claims isolated 

polynucleotide) 

6,048,711 Hinuma et al. "Human G-Protein Coupled Receptor Polynucleotides" (Claims 
isolated polynucleotide) 
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6,030,804 Soppet et al. "Polynucleotides Encoding G-Protein Parathyroid Hormone 

Receptor HLTDG74 Polypeptides" (Claims isolated polynucleotide) 

6,025,154 Li et al. "Polynucleotides Encoding Human G-Protein Chemokine Receptor 

HDGNRIO" (Claims an isolated polynucleotide encoding a protein identified as a 

"chemokine receptor" with no specific chemokine identified) 

5,998,164 Li et al. "Polynucleotides Encoding Human G-Protein Coupled Receptor 

GPRZ" (Claims isolated polynucleotide) 

5,994,097 Lai et al. "Polynucleotide Encoding Human G-Protein Coupled Receptor" 
(Claims isolated polynucleotide) 

5,958,729 Soppet et al. "Human G-Protein Receptor HCEGH45" (Claims isolated 
polypeptide) 

5,955,309 Ellis et al. "Polynucleotide Encoding G-Protein Coupled Receptor 
(H7TBA62)" (Claims isolated polynucleotide) 

5,948,890 Soppet et al. "Human G-Protein Receptor HPRAJ70" (Claims isolated 
polypeptide) 

5,945,307 Glucksmann et al. "Isolated Nucleic Acid Molecules Encoding A G-Protein 
Coupled Receptor Showing Homology to The 5HT Family of Receptors" (Claims 
isolated polynucleotide) 

5,942,414 Li et al. Polynucleotides Encoding Human G-Protein Coupled Receptor 
HIBEFSl" (Claims isolated polynucleotide) 

5,912,335 Bergsma et al. "G-Protein Coupled Receptor HUVCT36" (Claims isolated 
polynucleotide) 

5,874,245 Fukusumi et al. "Human G-Protein Coupled Receptors (HIBCD07)" (Claims 
isolated polynucleotide) 

5,871,967 Shabon et al. "Cloning of A Novel G-Protein Coupled 7TM Receptor" (Claims 
isolated polynucleotide) 

5,869,632 Soppet et al. "Human G-Protein Receptor HCEGH45" (Claims isolated 
polynucleotide) 

5,856,443 MacLennan et al. "Molecular Cloning and Expression of G-Protein Coupled 
Receptors" (Claims isolated polynucleotide) 

5,834,587 Chan et al. "G-Protein Coupled Receptor, HLTEXl 1" (Claims isolated 
polypeptide) 

5,776,729 Soppet et al. "Human G-Protein Receptor HGBER32" (Claims isolated 
polynucleotide) 

5,763,218 Fujii et al. "Nucleic Acid Encoding Novel Human G-Protein Coupled 
Receptors" (Claims isolated polynucleotide) 

5,756, 309 Soppet et al. "Nucleic Acid Encoding A Human G-Protein Receptor 
HPRAJ70 and Method of Producing the Receptor" (Claims isolated polynucleotide) 
5,585,476 MacLennan "Molecular Cloning and Expression of G-Protein Coupled 
Receptors" (Claims isolated polynucleotide) 

5,759,804 Godiska et al. "Isolated Nucleic Acid Encoding Seven Transmembrane 
Receptors" (Claims isolated polynucleotide and methods) 
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Applicant asserts that these issued US Patents are evidence of an art recognized 
utiHty for G-protein coupled receptors whose natural ligand is unknown. If the Patent 
Office's position is that issued patents are not sufficient evidence of art recognition then 
Applicant respectfully requests that this position be made of record. In the alternative, if 
the Patent Office wishes to take the position that these issued patents are directed to non- 
statutory subject matter, then Applicant respectfully requests that this position be made of 
record. 

The Office also alleges that protein belonging to the GPCR family, even if they 
have similar structures, can have different functions and, therefore, the invention is 
incomplete. However, Applicant does not determine function based on the structure of 
the encoded protein. Rather the prediction is based upon the sequence similarity with 
known polynucleotides or polypeptides encoded thereby. Although different structures 
can be formed by different amino acid sequences thereby allowing proteins with similar 
structures to have different functions, proteins that also share sequence similarity in 
addition to structural similarity are likely to be part of the protein family. It is well knovm 
that the probability that two unrelated polypeptides share more than 40% sequence 
homology over 70 amino acid residues is exceedingly small. Brenner et al., Proc. Natl. 
Acad. Sci. 95:6073-78 (1998) (See, attached reference). In the present application 
homology is in excess of 40% over many more than 70 amino acid residues. The 
probability, therefore, that the polypeptide encoded by the claimed polynucleotides is 
related to the reference polypeptides is, accordingly, very high. 

The Office has failed to provide any references that contradict Brenner*s basic rule 
and has failed to provide any "countervailing evidence" required by the Utility 
Examination Guidelines. Therefore, the Office has failed to meet its burden in providing 
evidence indicating that the present invention does not have a substantial, credible, and 
useful invention. 

In view of the foregoing, Applicant respectfully requests that the rejection under 
35 U.S.C. § 101 be withdrawn. 
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Rejections under 35 U.S.C. § 112 

Claims 1-22, 25-29, and 67-72 stand rejected under 35 U.S.C. § 112, first 
paragraph, as allegedly failing to adequately teach how to use the instant invention. 
According to the Office, "Since the claimed invention is not supported by either a 
specific, substantial or credible utility. . .one skilled in the art clearly would not know how 
to use the claimed invention." (Office Action, page 5). Applicant respectfully disagrees. 

As discussed above, the present invention is supported by a specific, substantial, 
and credible asserted utility as well as a well-established utility. Accordingly, Applicant 
respectfully requests that the rejection be withdrawn. 

The Office also alleges, that "even if the specification taught how to use the 
nucleic acid encoding the human nGPCR-1079 polypeptide, enablement would not be 
commensurate in scope with claim 1 and the dependent claims 3, 5-22, 25-29, and claims 
67 and 68." (Office Action, page 6). Applicant respectfully disagrees. 

As presently amended, claims 1, 3, 25, and 77 recite polynucleotides that have at 
least 90% homology to SEQ ID N0:1 or polypeptides that having at least 95% homology 
toSEQIDNO:2. 

The claims, as amended, are not excessively broad. A person of ordinary skill in 
the art would readily understand what is meant by "at least 90% homologous." 
Homology for a polypeptide and nucleic acid molecule is well understood by those of 
ordinary skill in the art and is described in the specification such that the present 
invention can be made and used by the art-skilled. 

A person of ordinary skill in the art would understand that the purified and isolated 
polynucleotide include, for example, polynucleotides that encode for polypeptides that 
have mutations when compared to SEQ ID NO: 2. One of skill in the art would readily 
be able to make and use such polynucleotides. 

Claims 1-22, 25-29, and 67-72 are also rejected under 35 U.S.C. § 112, first 
paragraph, as allegedly containing subject matter which was not described in the 
specification in such a way as to enable one skilled in the art to which it pertains, or with 
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which it is most nearly connected, to make and/or use the invention. The Office alleges 
that the polypeptide encoded by the nucleic acid molecule is not a complete sequence of a 
GPCR and therefore, it "is unlikely that the present nGCPR-1079 of SEQ ID NO: 2, 
which has merely 1/3 of the minimum length of a GPCR, is a functional GPCR even 
though it may be a portion of a GPCR, and undue experimentation is required prior to 
using the present invention for any purpose as claimed." (Office Action, page 8) 
Applicant respectfully disagrees. 

Although the disclosed sequences may not be full-length GPCRs, there is no 
indication that the present sequences possess no function or that the sequences cannot be 
used for other purposes even if they do not retain GPCR activity. Notably, the Office has 
failed to provide any evidence whatsoever that a polypeptide encoded by the claimed 
polynucleotides do not retain GPCR activity. One of ordinary skill in the art can readily 
determine if the polypeptide encoded by the polynucleotide of the present invention has 
activity. Experiments performed to determine activity are routine and well known by one 
of skill in the art. The Office is respectfully reminded that the relevant issue is not the 
amount of experimentation, but rather whether any experiments that may be performed 
would be undue to one of skill in the art. Enzymatic assays are routine to those of skill in 
the art. Assays to measure GPCR function are also well known in the art and are also 
described in the present application. Therefore, one of skill in the art would know how to 
make and/or use the present invention. 

However, even if the encoded polypeptide did not possess GPCR activity, one of 
skill in the art can still use the polypeptide to raise antibodies, to identify binding partners 
through various assays {Le. yeast two-hybrid), and the like. These experiments are 
routine to one of ordinary skill in the art and do not impose an undue burden. 

In view of the foregoing. Applicant respectfully requests that the rejection of claims 
under 35 U.S.C. § 1 12, first paragraph be withdrawn. 

Claims 1-4, 8, 22, 27, 67, 69, 71, and 72 were also rejected under 35 U.S.C. § 
112, first paragraph, as allegedly containing subject matter which was not described in 



31 



DOCKET NO: PHRM0026-100/00329.US1 



PATENT 



the specification in such a way as to reasonably convey to one skilled in the relevant art 
that the inventors, at the time the application was filed, had possession of the claimed 
invention. Applicant respectfully disagrees 

According to the Office, "only the isolated nucleic acid of SEQ ID N0:1 or 
encoding the amino acid sequence of SEQ ID N0:2, but not the full breadth of the claims 
meets the written description provision of 35 U.S.C. § 1 12, first paragraph. 

Preliminarily, Applicant thanks the Office for its acknowledgement that written 
description support exists in the specification for nucleic acids encoding SEQ ID N0:2. 

As discussed above, the claims have been amended to recite a specific level of 
homology. Applicant asserts that a skilled artisan can readily envision the structure of 
the claimed polypeptides and nucleic acid molecules based on the present application. 
One of ordinary skill in the art understands that the polynucleotides or the polypeptides of 
the present invention will have at least 90% homology to either SEQ ID NO: 1 or SEQ 
ID NO: 2. This is more than "a mere statement that is part of the invention". Rather, the 
recited percent homology is a defining structural characteristic of the present invention. 
The present invention encompasses only the nucleic acid molecules or polypeptides with 
at least 90% homology to SEQ ID NO: 1 or SEQ ID NO: 2. 

New claim 80 has been added that recites a nucleic acid molecule that encodes for 
a polypeptide that is at least 99% homologous to SEQ ID N0:2. Applicant respectfully 
asserts that the skilled artisan can readily envision the detailed chemical structure of the 
polypeptides encompassed by new claim 80. 

The subject matter encompassed by the pending claims is described in the 
specification in such a way as to reasonably convey to one skilled in the relevant art that 
the inventors, at the time the application was filed, had possession of the claimed 
invention. 

Therefore Applicant respectfully requests that the rejection of claims under 35 
U.S.C. § 1 12, first paragraph be withdrawn. 

Claims 1-22, 25-29 and 72 stand rejected under 35 U.S.C. § 112, second 
paragraph, as allegedly indefinite for failing to particularly point out and distinctly claim 
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the subject matter which applicant regards as the invention. Applicant respectfully 
disagrees. 

The Office alleges that claim 1 is indefinite because it is not clear what 
"homologous." According to the office, the "claim does not specific the percentage of 
the sequence identity or any other objective measurement." (Office Action, page 9). 
Applicant has amended claim 1 to recite "at least 90% homologous", rendering this 
rejection moot. 

The Office alleges that claim 10 is indefinite for the recitation of "said vector is a 
viral particle". Applicant has amended claim 10 to recite "said vector is a viral vector" 
rendering this rejection moot. 

Claim 22 stands rejected as allegedly indefinite. Applicant has canceled claim 22 
without prejudice, rendering this rejection moot. 

Claims 25 and 26 stand rejected as allegedly indefinite for the recitation of "an 
acceptable carrier or diluent" because it is allegedly unclear what is "acceptable." 
Applicant respectfiilly disagrees, but in order to fiirther prosecution, has amended claims 
25 and 26 to recite a "pharmaceutically acceptable carrier or diluent." The term 
"pharmaceutically acceptable" is described in the present specification (see, for example, 
pages 32-33) and is also well known to those of skill in the art. 

Claim 27 stands rejected as allegedly indefinite for using the inclusive language 
"and" in "a polypeptide that comprises a sequence of SEQ ID NO 2 and homologs 
thereof." Applicant has amended claim 27 removing the phrase "and homologs thereof 
rendering this rejection moot. 

In view of the foregoing, Applicant respectfiilly requests that the rejection under 
35 U.S.C. § 1 12, second paragraph be withdrawn. 

Rejections under 35 U.S.C. § 102 and § 103 

The Office rejected claims 1-22, 25-29, and 67-71 under 35 U.S.C. § 102 and/or § 
103 in view of its erroneous assertion that the effective filing date for the instantly 
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claimed invention is August 15, 2001, which is the actual filing date of the instant 
application. 

As discussed above, the effective filing date of the present application is that of its 
priority application, filed August 15, 2000. The Office alleges that the present 
application is not entitled to its priority date because the prior application did not satisfy 
the requirements under 35 U.S.C. § 101 and 112, first paragraph. However, as discussed 
above, the prior application does satisfy the requirements under 35 U.S.C. § 101 and § 
112, first paragraph, for the reasons set forth above, and therefore is entitled to the 
priority date of August 15, 2000. 

Claims 1-22, 25-29, and 67-71 stand rejected under 35 U.S.C. § 102(e) as 
allegedly anticipated by Paszty et al (US 20002/0123618. The effective date of Paszty is 
August 10, 2001, which is after the effective date of the present application (August 15, 
2000). Therefore, the Paszty reference does not qualify as prior art against the present 
application. 

In view of the foregoing. Applicant requests that the rejection under 35 U.S.C. § 
102(e) be withdrawn. 

Claims 1-9, 13, 16, 20-22, 25, 26, and 69-71 stand rejected under 35 U.S.C. § 
102(a) as allegedly anticipated by Chen et al. (WO 01/36471). The effective date of Chen 
is May 25, 2001, which is after the effective date of the present application (August 15, 
2000). Therefore, the Chen reference does not qualify as prior art against the present 
application. 

In view of the foregoing, Applicant requests that the rejection under 35 U.S.C. § 
102(e) be withdrawn. 

Claims 10-12, 14, 15, 17-19, 27-29 stand rejected under 35 U.S.C. § 103(a) as 
allegedly unpatentable over Chen in view of Glucksmann et al (U.S. Patent No. 
5,945,307). Applicant respectfiiUy disagrees. 
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As discussed above the Chen reference does not qualify as prior art for the present 
application. Therefore the only remaining reference is the Glucksmann reference. The 
Glucksmann reference discusses isolated nucleic acid molecules encoding a G-protein 
coupled receptor showing homology to the 5HT family of receptors. However, the 
Glucksmann reference fails to teach or even suggest SEQ ID NO: 1 or SEQ ID N0:2. 
Therefore, a person of ordinary skill in the art would not have been motivated to use SEQ 
ID NO: 1 or 2 and combine it with what is discussed in Glucksmann. Furthermore, even 
if one of skill in the art were motivated to use the Glucksmann reference, a person of 
ordinary skill in the art would not be in possession of the present invention because it 
does not teach or suggest the sequences of the present invention. Therefore, the present 
invention is not obvious in view of the Glucksmann reference. 

In view of the foregoing, Applicant requests that the rejection under 35 U.S.C. § 
103(a) be withdrawn. 
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Conclusion 

Applicant believes the claims are in condition for allowance. An early Notice of 
Allowance is therefore earnestly solicited. Applicant invites the Examiner to contact the 
undersigned at (215) 665-6928 to clarify any unresolved issues raised by this response. 



Date: November 19, 2003 

COZEN O'CONNOR, P.C. 
1900 Market Street 
Philadelphia, PA 19103-3508 
Telephone: (215)665-2000 
Facsimile: (215)665-2013 

Attachments: Marchese et al., Trends Pharmacol. Sci., 20(9):370-5, 1999 



Brenner et al. Proc. Natl. Acad. Sci. 95:6073-78 (1998) 
Product Sheet for Anti-GPCR-75 Antibodies 
Product sheet for GPCR control peptides and antibodies (MD Bio) 
Product sheet for GPCR ORF clones (Invitrogen) 




Respectfiilly submitted. 



Daniel M. Scolnick, Ph.D. 
Reg. No. 52,201 
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33 total records for G*Proteln Coupled Receptors 



Buy 


Clone ID 


Species 


Definition 


uene 
Symbol 


r 


IOH3294 


Human 


complement component 5 receptor 1 <C5a llgand); complement component- 
5 receptor-2 (C5a ligand) 


C5R1 


r 


IOH12614 


Human 


purtnergic receptor P2Y« G-protetn coupled, 11 


P2RY11 


c 


IOH22483 


Human 


done MGC: 33224 IMAGE: 5267661, mRNA, complete cds. 


RDCl 


□ 


lOH 14039 


Human 


Similar to putative nudear protein ORF1-FL49 


ORF1-FL49 


r: 


IOHil484 


Human 


glycoprotein lb (platelet), alpha polypeptide 


GPIBA 


r 


IOH1987 


Human 


tachykinin receptor 1 Isoform short; NK-l receptor; Tachylctnin receptor 1 
(substance P receptor; neurokinin- i receptor); tachykinin 1 receptor 
(substance P receptor, neurokinin 1 receptor); neurokinin 1 receptor 


TACRl 


rr 




Human 


Sfmltar to POSSIBLE GUSTATORY RECEPTOR CLONE PTEOl 


tociisia: 


r. 


IOH9916 


Human 


coagulation factor n (thrombin) receptor-like 1 


F2RL1 


r 




Human 


vasoacUve intestinal peptkle receptor 2 


VIPR2 


n 




Human 


endotheiln receptor type A 


EONRA 


r 




Human 


Similar to parathyroid hormone receptor 1, done MGC:34562 
IHAGE: 5 180885, mRNA, complete cds. 


PTHRl 


r 


lOH 13583 


Hunnan 


Duffy blood group 


FY 


r 


IQH4585 


Human 


diolecystoklnln B receptor 


CCKBR 


r 




Hunwn 


endothellBl differentiation, lysophosphatklk: add G-protein-coupled receptor, 
4; G protein-coupled receptor; LPA receptor EDG4; Lysophosphatkllc add 
receptor EDG4 


E0G4 


r 




Human 


C097 antigen isoform 2 precursor; leukocyte antigen CD97; seven-span 
transmembrane protein 


C097 


r 




Human 


formyl peptide receptor-tike 1; llpoxin A4 receptor (formyl peptide receptor 
related) 


FPRLl 


r 




Human 


adrenomedutlin receptor 


AOMR 


r 




Human 


super conserved receptor expressed In brain 3 


SREB3 
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Customer Service: 1 800 786 1 236 
Technical Support: 318 219 1123 
Fax: 318 798 1849 

lnfo@fabgcnnix.com 
www.fabgennix.com 

„^ . , New Item 

FabGenmx Inc. 

INTERNATIONAL 

Novel Orphan retinal G-protein coupled Receptor (GPCR-75^ selective antibodies 
Anti-GPCR-75 Antibodies fGPCR75-100P, GPCR75-101 AP and GPCR7S-1 12AP^ 

Recently a novel human G-protcin coupled receptor gene has been characterized and mapped to chromosome 2pl6. This gene codes for a 
540 amino acid protein in retinal pigment epithelium (RPE) and cells surrounding retinal arterioles. In contrast, the Northern blot data 
.obtained from mouse sections suggest the expression of transcripts in photoreceptor inner segments and I outer plexiform layer. The 
transcripts of the GPCR-75 gene {7kb) are also found in abundance in brain sections. So far, no mutations in GPCR-75 protein were identified 
in patients suffering from Doync's honeycomb retinal dystrophy (DHRD). an inherited retinal degeneration disease that maps to chromosome 
2pl6(l). 

The GPCR-75 protein is approximately 78 kDa (540 amino acids) protein that is primarily expressed in human retinal pigment epithelium 
(RPEs). The GPCR-75 sequence analyses suggest the presence of 7 trans-membrane domains, a characteristic feature of GPCR. The protein 
has putative N-glycosylation sites near ttw extra cellular N-terminal end of the proteins. The protein has a large 3 intra cellular loop which 
might be the site for interaction of GiMOtcins. The short carboxy terminal is intracellular and has putative post-translational modification lipid 

modification sites. . . ru * • * 

The Anti-GPCR-75-selective antibodies were generated against conserved sequences near N- and C-termmi of the protem that arc unique 
to GPCR-75 protein. The polyclonal antibody strongly labels a 78 kDa protein in RPE cell extracts, Anti-GPCR-75-selective antibody is also 
available in affinity-purified form for confocal. Western blotting and immunocytochemical analyses. FabGennix InL Inc. will also conjugate 
antibodies with fluorescent probes upon request at extra charge. FabGennix Int. Inc. will also provides antibodies against proteins that arc 
involved in retinal degenerative diseases such as various Anti-PDE antibodies. Anti-MERTK, Anti-Phospho-MERTK, EGF-containmg fibulm 
like intiBOcllular protein (EFEMPl), Anti-Myocilin (TIGR). Anti-Bestrophin, Anti-ELVOW and a Usher syndrome specific Anti-USH2a 
antibodies etc. FabGennbc Int Inc employs cyclic peptide methodology for generating antibodies, which results in higher titer and specificity 
(2) FabGennix Int Inc.. will also provide Western blot positive controls for most of tiicse antibodies in ready-to-usc buflfcr for easy 
identification of respective proteins. Limited quantities of antigens are also available. Please enquire for tficir availability before ordering. 



Catalog # 


Host Species 


Nature 


Cross reactivity 


Quantity 


volume 


Price 1 


GPCR76-100P 


Rabbit 


Polyclonal antisera 


R.M.H 


100 ml 


100 ul 


$ 195.00 


GPCR76-101AP 


Rabbit 


Affinity purified IgG 


R,M.H 


100 ug 


150 ul 


$ 225.00 


GPCR76-112AP 


Rabbit 


Affinity purified IgG 


R. M,H 


100 ug 


150 ul 


$ 225.00 


PC-GPCR76 


N/A 


WB posltivB control 


Rat 


For 5 App 


60 ul 


$ 75.00 


P.GPCR75 


N/A 


pj.til.tJ.IIJ.l.l.i.|.UJi 


|n/a 


250 ug 


inquire 


$ 65.00 



R « rat M « mouse; H « human; C « chicken; monk « monkey :• iwt en vartan^ 
Synthetic cyclfc peptide (OPCR75-101 AP <- PNATSLHVPHSQEONSTS-amidc; GPCR75-1 12AP - 
STSLC^LQDUHTATLVTC<midc). 

OPCR75-101AP; OPCR-1 12AP IgO cooccntiitioa 0.75-115 mg/knl in 50% antibody itabaiatioa buffer. 
Ai«tibo<iy<M«l75.l00WCR75-l01AP«icidcdlbrWB.^ The dflotions for this antibody Is for 

reference only, Imrestigitort are expected to 

specific assay In Us/bcrbbotatoiy. Daotioos: WB > 1:500; Immimoprecipitatioa & Lp pull-down assays^ 1^250 
Tills antiTKKly detects aslngk 78 kDa Orphan GPCR75 protdn fa human RPE cctt extracts. 78 kDa OP 75 

Standacd protood for varioos ap{4k»tkms (WB; 

product spodficatiooAeeC however. FabOamlx int Inc. tCioagly leoommeods faivcstigators to 
optiimze coodttioiis for Qse of this antibody fa (heir lab^ 
Form/Stoiage: The antisenim b tuppGed in anlfoody ttabOratlon buffer with 0.02% sodium azkle or thhnecosaUmerthlotete as 
presewative. The amntty-purffied antibodies are purified on anUgen-epeharose affinity column and tuppfied as 1- 
1.25 moAnl IgG fa antibody stabfliratlon buffer contafalngpce^ ^ 
properties. For teng4effli storage of antibodies, store at -2(^0. Now these antibodies can be stored at 
5»edhtmedte!elyw«h out thawing. FabGennbc Incx does not recommend storage of very dauteanttoojr *^^ 
unless they are prepared fa specfaO/fdnnulated mufti use antbodydlliitionM Wbridng 
sotuttons of antibodies fa OfluOBuffer should be fBtered through 0,45}i fitter after every use for iong-temn storage. 



Immunogen: 

Goncentratioa: 
Applications: 



Reactrnty 
Protocols: 




References 
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C A., Gregory-Evans C Y. Biocliem. Biophys, Res, Conunuo, 260. 174*180. 1999. 
2. Fsrooqul, S, M., Brock. W. J., A. Hamdi., Prasad. C (1991) J. Ncurodiem. 57, 1363-1369. 
^ Rk «m ^ may lequiit large amowiis of OPCR75-I00P or OPC^^ 
This Prodocl is for Research L»sc Only and b NOT faundcd for wc fa humans or dfaicat diagnosU. 06190I-0020SF lOOlZ-ievlO.OO 



78 kDa Orphan Receptor-75 
in human RPE oells. 
Antibody OPCR-IOOP 
(1:400) 
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Rat Taste Receptor 2 (TR2) Antibodies 

Rat Taste Receptor 2 (TR2) Antibodies 



Cat. # TR21-P, RatTR2 Control Peptide # 1, SIZE: 100 ug/100 ul 
FORM: (E Soln (E Lyophilized Lot # 3 11 3? 

Cat. # TR21-S, Rabbit Anti-rat TR2 antiserum # 1, SIZE: 100 ul neat antiserum 
FORM: (E Soln GE Lyophilized, Lot # 38889S 

Cat. # TR21-A, Rabbit Anti-rat TR2 Ab # 1 (affinity pure) SIZE: 100 ug 
FORM: (E Soln (E Lyophilized. Lot # 38889A 



Higher vertebrates are believed to possess at least five basic tastes: Sweet, bitter, sour, salty, and unami 
(the taste of monosodium glutamate). Taste receptor cells that may selectively reside in various parts of 
the tongue and respond to different tastants and perceive these taste modalities. Circumvallate papillae, 
foimd at the very back of the tongue, are particularly sensitive to biter substances. Foliate papillae, found 
at the posterior lateral edge of the tongue, are sensitive to sour and bitter. Fungiform papillae at the front 
of the tongue specialize in sweet taste. 

Recently, two novel taste receptors, TRl and TR2, have been cloned with distinct topographical 
distribution in taste receptor cells and taste buds. TRs are members of a new group of 7 TM domain 
containing GPCR distantly related to other chemosensory receptors (Ca+-sensing receptor (CaSR, a 
family of putative hormone receptor (V2R), and metabotropic glutamate receptors), TRl is expressed in 
all fungiform taste buds, whereas TR2 localized to the circumvallate taste buds. Both receptors do not 
co-localize with gustducin. 

Source of Antigen and Antibodies 

TRl (rat 840 aa) and TR2 (rat 843 aa) share -40% homology with each other, and --30% with CaSR, 
and 22-30% witti V2R pheromone receptors and mGLURs. Rat TR are 7 TM domain containing protein 
with an extra long N-tenninal, extracellular domain (1). A 19 AA Peptide (designated TR21-P; control 
peptide) sequence near the C-terminus of rat TR2(1) was selected for antibody production. The peptide 
was coupled to KLH, and antibodies generated in rabbits. Antibody has been affinity purified using 
control peptide-Sepharose. 

Form & Storage 

Control peptide Solution is provided in PBS, pH 7,4 at 1 mg/ml (100 ug/100 ul). Antiserum is supplied 
as neat serxmi (100 ul soln or lyophilized). Affinity pure antibodies were purified over the peptide- 
Sepharose column and supplied as 1 mg/ml soUi in PBS, pH 7.4 and 0.1% BSA as stabilizer (100 ul in 
solution or Lyophilized). 

The peptides and antibodies also contain 0.1% sodium azide as preservative. Lyophilized products 
should be reconstituted in 100 ul water and gently mixed for 15 min at room temp. All peptide/antibody 
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received in solution or 

reconstituted from lyophilized vials should be stored frozen at -20oC or below in suitable aliquots. It is 
not recommended to store diluted solutions. Avoid repeated freeze and thaw. 

Recommended Usage 

Western Blotting (1:1K-5K for neat serum and 1-10 ug/ml for affinity pure antibody using ECL 
technique). 

ELISA: Control peptide can be used to coat ELISA plates at 1 ug/ml and detected with antibodies (1:10- 
50K for neat serum and 0.5-1 ug/ml for affinity pure). 

Histochemistry & Immunofluorescence: We recommend the use of affinity purified antibody at 1-20 
ug/ml in paraformaldehyde fixed sections of tissues (1). 

Specificity &. Cross-reactivity 

The 19 AA rat TR21-P control peptide is specific for rat TR2. It has no significant sequence homology 
with TRl or gustducin or pheromone receptors. Antibody cross-reactivity in various species has not 
been studied. The TR21-P control peptide is available to confmn specificity of antibodies. 



1. Hoon MA et al (1999) Cell 96, 541-555; Lindemann B (1999) Nature Med. 5, 381-382 

"Neat Antisera" are the unpurified antiserum and It is suitable for ELISA and Western. 
"Affinity pure" antibodies have been over the antigen-affinity column and recommended for 
immunohistochemical applications. 

"Control peptides" can not be used for Western as they are very short peptides. They are 
intended for ELISA or antibody competition studies. 

List of Related Products 
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Novel GPCRs and their 
endogenous ligands: 
expanding the boundaries 
of physiology and 
pharmacology 

Adriano MarchesB, Susan 11 George, 
Lee F. Kolakowski Jr, Kevin R. Lynch and 
Brian F. O'Oowd 

Nearly all molecules known to signal cells via 6 proteins 
have been assigned a cloned G^rotein-coupled-receptor 
(GPCR) gene. This has been the result of a decade-long 
genetic search that has also Identified some receptors 
for which ligands are unknown; these receptors are 
described as orphans (oGPCRs). More than 80 of these 
novel receptor systeniis have been identified and the 
emphasis has shifted to searching for novel signaliing 
molecules. Thus, multiple neurotransmitter systems 
have eluded phannecological detection by conventional 
means and the tremendous physiological implications 
and potential for these novel systems as targets for 
drug discovery remains unexploited. The discovery of 
all the GPCR genes in tiie genome and the identification 
of the unsolved receptor-transmitter systems, by .. 
determining the endogenous ligands, r-presents one of 
tiie most important tasks in modem pharmacology. ' 



The G-prolein-coupled receptors (GPCRs) are transdut 
of extracelhilar messages and they allowtissues to resp 
to a wide array of signalling molecules. Most of the 
dogenous ligands are small and the binding of thes< 
gands to their receptor(s) can be mimicked (or block 
by syndietic analogues. Together with the knowledge I 
numerous GPCRs are targets of important drugs in 
today, GPCR identification is a task of prime importaj 
In the 14 years since the first doning of genes for GPC 
most of the molecules known to signal cells via the heti 
trimeric G-protein--effector systems have been assig 
a cloned GPCR gene. However, die vigorous search 
novel GPCR genes has far outpaced the identificatioi 
novel endogenous ligands. A group of genes has been i< 
tified whose products are, using the aiterion of sequc 
similarity, members of the GPCR family but for whidi 
ligands ate not known, and these are commonly knc 
as orphans (oGPCR). 

The GPCR gene family is the largest known rece] 
family (see Box 1) and shares a conunon secondary st 
ture that consists of seven transmenibrane domains, 
ting aside the odorant receptors (encoded by himdj 
of genesX nearly 300 mammalian GPCR genes haye I 
lecognizedi. On the basis of structure, the GPCRs ca 
separated into three subfamilies. The inclusion of a re 
tor in a subfamily requires the presence of an overall 
centage amino add identity and not any discrete rr 
Most GPCRs, including the odorant receptors, are groi 
in Family A. Several additional GPCRs, which hzx 
their ligands peptides such as seaetin, vasoactive inl 
nal peptide and calcitonin, make up Family B. Fair^ 
comprises the metabottopic glutamate receptors, 
Ca^^-sensing receptor, pheromone receptors, the 
receptors and tiie taste receptors. Within ead\ faj 
GPCRs are grouped by sequence similarity and U; 
spedfidty; approximately one third of Family A men 
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Box 1. How big is the GPCR family? 



The size of the GPCR family surprised even the most 
optiznisdc pharmacolog;ist as many siibfamilies proved 
to be larger than had been predicted by classical pnaima- 
cological techniques. Fumiennore, some ligands that 
were not widely considered to signal via receptors (e.g, 
nucleotides) are recognized now to have numerous re- 
ceptor subtypes. The discovery of these multiple sub- 
types/ new ligands and the rapid accumulation of novel 
GPCR sequences have led to the expectation diat many 
more mammalian GFCKs await discovery. ThuS/ an ob- 
vious question to ask is: how many GPCK gcxves are 
there in d\e human genome? Although simply waiting 
a few years should answer this question directly, diere 
are practical implications in making an educated guess 
now. For example, is the receptor for a candidate li- 
gand likely to be visible now among the existing oGPCR 
DNAs? And, is further searching for oGPCR DNAs a 
worthwhile endeavour? 

The recent completion of the nematode (Camorhabditis 
elegarts) trarulated genome provides an inferesting com- 
parison to mammalian GPCRs. In contrast to the single 
cell yeast (with its two GPCR genes), multicellularity 
obviously demands ceU*to<ell communication and the 



added complexity imposes a requirement for a much 
larger repertoire of GPCRs. According to th analysis 
reported by Bargmann\ 5% of the 19100 nematode 
genes encode GPCRs- Their distribution among GPCR 
families is reminiscent of the mammalian GPCR genes, 
some 70O-X0O0 chemoattractant (odorant) genes (includ- 
ing numerous pseudogenes), approximatdy 150 Family 
A genes and fou7«to-nve each Family B and C genes. 
By analogy, this suggests that the number of mammalian 
GPCRs could total 5000 (5% of mammalian genes esti- 
mated to be 80 OOO-lOO 000). Unfortunately, the C ^garis 
genome provides no direct dues for oGPCR identifi- 
cation as ^ closest nematode GPCR la <35% identical 
to any mammalian GPCR and there are no obvious hom- 
ologues to marrunallan pie-pro-neuropeptide genes. 
In contrast the accumulation of nucleotide sequence in- 
formation from another surrogate organism, me z^ra- 
gsh {Danio rerio), should be more informative because 
the conceptualized GPCR amino acid sequences are 
often -70% identical to orthologous mammaUan GPCRs. 

Reference 

1 BatgmanaC (1998) Sdena 252,2028-2033 



are oGPCRs and this review willfocus on these receptors. 
Thus, in a decade, the bst of signalling molecules for which 
the CPCR genes had not been doned has been supplanted 
by a list of -80 oGFOls awaiting a ligand (see Table 1). 
Tl\e characterization of these GPCRs has already enabled 
the discovery of several new endogenous ligands; this 
will be discussed later. 

Novel GPCR gene discovery 

Very few GPCRs have been ptmfied, dius the pace of 
GPCR gene discovery has been fuelled by a series of highly 
successful doning techniques. The identification (iising 
amino add sequence determination and expression 
doning) of a few sequences encoding Family A GPCRs 
demonstrated that these were related genes^. Qoning by 
low stringency hybridization^to cDNA/ genomic DNA 
libraries yielded a stream of novel GPCR DNAs. The 
pace of discovery quickened with the use of the poly- 
merase chain reaction (PGR). The database of expressed 
sequence tagged cDNAs (ESTs) has provided material 
for a further expansion oif Family A, as has the high- 
throughput sequencing of 100-200 kb pair segments of 
human DNA. 

Novel GPCR identification 

Many oGPCRs are found to be similar to known GPCRs. 
Where the identity reaches the threshold of -45%, it is 
likely that Ae receptors will share a common ligand, i£. 
that die oGPCR will be a pharmacological subtype of die 
known GPCR. This rule is not widiout exception. Take, 
for example the orphanin FQ/nodceptin receptor; this 
has -65% amino addidentity to opioid receptors, but does 
not have high affinity for opioid peptides^^. Many GPCR 
subtypes have <40% amino add identity, in which case 
sequence comparison might not be profitable: Moreover, 



because die ligand-binding pocket has not yet been de- 
saibed fully for any receptor, it is not feasible to predict 
ligand identity. However, dendritic tree building shows 
that receptors diat respond to the same, or similar, agon- 
ists often duster. For example, most members of the 
prostanoid receptor subfamily share <30% amino add 
identity, yet diese eight receptors are more like one another 
dian any other GPCR. A similar situation exists among 
Ihe nudeotide receptors, diemokine receptors and other 
cationic amine receptors. In die way that many known 
GPCRs fall into subfamilies, many oGPCRs cluster to* 
gedier, sometitnes widi members having >50% amino 
add identity, which suggests that the problem of die -80 
oGPCRs might be solved by a mere 30 or 40 ligands. For 
example, the recent identification of Edg-1 as a sphingosine 
' 1-phosphate receptor^ leads directly to the prediction 
diat Edg-3 andEdg-5 (both >50% identical to Edg-l)have 
the same ligand. More distant members of the Edg dus* 
ter, Edg-2 and Edg4 are known to be receptors for die 
structurally related ligand, lysophosphatidic add^-^. 

When homology does notinfonn, i.e, the nearest known 
GPCR has <3S% amino add identity to die orphan, ligand 
idwitification is diallenging. There are no sijghatiue amino 
adds diat predict either ^e nahire of the ligand or the 
identity of die iiUerading Ga subunit type(s). In those 
cases where the Egand is a molecule with an established 
pharmacology, tissue distribution has allowed inference 
of ligand identity. Thus, an important due to identifying 
die oGPCR RDC-8 as encoding die a denoane Aja receptor 
was the concordance of in situ hybridization and ligand 
(PH]CGS21680} autoradiography signals in rat brain sec- 
tions". Similarly, the occurrence of both carmabinoid 
binding sites and SKR6 receptor xnRN A accumulation in 
NG108 cells led to die identification of die cannabindd 
CBj receptor". ^ — r— ; 
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Table 1 . Amino acid sequence identity of soma orphan G-protein-coupled receptors 



Homology 



Name 



Species 



% Amino acid identity 



Accession no. 



Opioid and somatostatin receptor-like 



Chemokine receptor-like 



Chemoattractant receptor-like 



Angiotensin receptor-tike 
Cannabinold racepttir-like 

GPR4 receptor-like 

Neuropsptide Y receptor-like 
Amine receptor-like 



P2 receptor-like 



6PR7 

GPRS 

GPH24 

GPR14 

6PRS4 

QPR2 

CKRX 

EOl 

MIP-loRLI 

6PR28 

STRL33 

PPRl 

glOd 

HDCI 

TM7SF1 

CIR1 

Dez 

FPRL2 

FPR2 

6PR1 

Gpnao 

6PR3Z 
6PB33 
GPR44 

mas oncogene 

MRG 

RTA 

GPR53p 

GPfllS 

GPR25 

6PR3 

GPRS 

6Pfl12 

EDG-6 

0GR1 

GPR4 

TDAGB 

G2A 

GIR 

GPR19 

GPR22 

PNR 

GPR26 

GPR27 

A6R9 

GPR21 

PSP24 

6PR45 

A-2 

6PR5Z 

HE2 

GPR57 

GPHSa 

6PR61 

GPR62 

GPR23 . 

RBintron 

GPR35 

'P2Y,e . . . 

GPfll7 

GPflIB • • 

HM74.. -. 

GPfl3.1 



Human S2% GPRS, 40% ss^ U22491 

Human B2% 6PR7. 45% sst, 1)22492 

Human 33% ssv 32% sst. U71032 

Rat 29%M--opioid.28%sst. U32673 

Rat 37% gal2. 35% GAL1 Af 1 15516 

Human 41%CXCB3,40% CCR7- U13667 

Human 53% EOl. 43% CCfll AFOUgSB 

Mouse S3%CKRX.36%CCR1 AF03018S 

MousQ 62% CCR1 , 50% CCR3 U28405 

Human 43% CCR7, 3S% CCR6 U459B2 

Human 37% CCfl7, 37% CCR6 U73529 

Bovine 39% CCR7. 37% GPR28 • S63848 

Rat 33% RDCI. 30% CCR9 L09249 

Human 33%9l0d.30% CXCH2 X14048 

Human 22% GPRS, 14% CCR6 AF027826 

Chicken 51%Blfl1.3B%CXCRl AF029369 

Human 37% GPfll, 35% FPB2 U79527 

Human 72% F=PR2. 56% FPR1 M7S673 

Human 72% FPRL2, 69% PR1 M78672' 

Human 37% Oez. 34% FPR2 U1 3666 

Human ' 32% FPR12, 32% FR2 AF02795e 

Human 39% FPB1, 35% FPRL2 AF045764 

Mouse 36% GPR32, 36% Dez AF045766 

Human 37% De2.36% FPRL2 AF1 18286 

Human 34% MRG, 26% CSaR M13150 

Human 34% mas oncogene, 34% CSaR S786S3 

Rat 32%/7»SQncogane,33%MR6 M32098 

Human 35% MRG, 28% maioncogene AF096785 

Human 34% 6Pfl25, 31 % APJ U34808 

Human 34% GPB15, 32% APJ U9193g 

Human 59% GPR6, 57% GPR12 U1366B 

Human 59% GPR3. 56% GPR12 L36150 

Rat 57%GPR3,56%6PR6 U18548 

Human 46% H)G-3, 44% EDG-1 AJ000479 

Human 48% 6Pn4, 35% T0AG8 U48405 

Human 48% 6PR1 2A, 36% TDAGB L36148 

Human 36% GPH4, 35% GPni2A U95218 

Mousa 34% GPR4, 31 % 06R1 AF083442 

Mouse 35%GPR10,30%NKj MB0481 

Human 27% GAL1 , 26% NPY U64871 

Human 26% NPY Y^. 24% CCK, U66581 

Human 33% S^ff^, 33% S-HT, AF02181 8 

Human 28% S-HTsg. 23% SMls,, 

Mouse 29%D4.25%5-HT, AF027955 

Rat 24%Hj.24%NKj S73608 

Human 27% pjAR, 24% piA« ^88580 

Human 26% S-HT.. 23% p,AR ^V^^ 

Human 70%PSP24,21%NK2 AF118266 

Human 21% 54(T,f, 19%5-Kr,t 

Human 71 % GPR21 , 27% Hj AF096784 

Human 25% Oi/R, 25% a,cAR AF091830 

Human 59% GPR58, 37% PNR N/A 

Human 59% GPR57, 42% PNR N/A 

Human 27% LZY2. 30% N/A 

Human 27% 12Y, 28% S-HTj N/A 
Human 53% RBintron. 33% TOjo" U86578 

Human 53% 6PR23. 38% P2Y« V^IL, 

Human 32% GPR23. 30% HM74 

Human 34% RBintron. 33% 6PH23 
Human 35% P2Yi. 34% P2Y, U33447 
Human 30% RBintron. 29% SPR17 L42K4 
Human 36%GPR31.29%P2Y, . 0109^ 

Human 36%HM74.29%P2Y, .,. U65402 . 
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Table l.(cont) 



Homology 


Name 


Species 


% Amino acid identity 


Accession no. 


P2 receptor-like [cont] 


RSC338. 


Human 


33%H9&3.2a% tp2y 


013626 


EB12 


Human 


33% R8introa30% CCRl 


L08177 




Hg63 


Human 


33% RSC338,28%PAFR 


AF0O2986 




GPR41 


Human 


98% GPR42. 41%GPR43 


AR)24688 




6PR42 


Human 


98% GPR41.28% GPR23 


AF024689 




6PR40 


Human 


31%6PR43,26%CXCfll 


AF024687 




GPR43 


Human 


41%ePR41,31%GPR40 


AF024S90 




GPR20 


Human 


31%P2Y4,26%GPfl23 


U66579 




6PR34 


Human 


31%RSC338,29% RBintron 


AF118670 




GPR55 


Human 


29% P2Ys,30% 6PR23 


AF096786 


Neurotensin receptor-Kke 


GHS-R 


Human 


35% NTS1,33% ntsZ 


U6017g 


GPR33 


Human 


32% hn^1.25%nis2 


AF034633 




HS0GPCR2 


Human 


38% GPRaa 34% GHS-R 


'AFO44601 


Melatonin recepior-lilce 


H9 


Human 


48% MLiA. 45% MLjg 


U522;g 


Endotheiin receplor-iike 


GPR37 


Human 


68% Er5R-LP-2,27%ET8 


U87460 


ETBR-LP-Z 


Human 


68% GPR37, 27% ETg 


Y162B0 


Glycoprotein hormone receptor-like 


LGR5 


Human 


26% FSH-a 75% LH-R 


AF062006 


Opsin receptor-lilce 


EncephaloDsin 


Human 


32% Psropsia 31 % Rhodopsin 


AF140242 


RGR 


Human 


27% Paropsin, 26% Rhodopsin 


U15790 



Please refer to the TiPS Receptor and Ion Channel Nomenelaw/9 Suf^lemamdivi to individual GenBank accession fwmbers for further information. 



Endogenous ligand identification 

In tine same way tiiatEST database seardiinghasyidded 
GPCR DNAs, it has also yielded DNAs encoding peptide 
sequervces related to known peptides. Severalnovel chemo- 
kines have been discovered using this approach and these 
have proven to be the ligaivls for several chemoldne 
receptors. For example, a CC chemokine termed ELC 
(EBMigand chemokine) was identified from the EST data- 
base and fbtind to be the endogenous ligand for the orphan 
receptor EBIl, which has smce been renamed CCR7 (Ref . 
12). Similarly, flw CC chemokine liver and activation- 
regulated chemokine (LARC) was identified from the 
EST database" and subsequently shown to be the ligand 
for the orphan STRL22 receptor; to was renamed CCR6 
(Refe 14r-16). AnotiierESTencocKngaCXC chemokine was 
isolated, BCAl (Ref. 17), and later identified as a ligand for 
the oGPCR BLRl, whidihas since been renamed CXCR5 
(Ref. 18), A fourth, novel class of chemokines called 
5-chemokines, or CX3C chemokines, was discovered by 
automated Wgh-^ughpiit single-pass sequencing and 
analysis of a cDNA library constructed from murine 
choroid plexus^'. The sequence of one of diecDNA dones 
exhibited similarity to murine monocyte diemoattractant 
protein-1 (MCP-1), ana-diemokine. Also, another group 
independmtly searched die EST database with known 
chemokine sequences and identified the same chemokine 
which ttey have termed fractaDdneM. Ttes ligand was 
matched to the orphan receptor V28 (renamed CX3CR1 )». 
The ligand for the novel rec^tor encoded by GPRS (Ref. 
22) has been identified as the single C motif-l peptide23 
and the receptor renamed as XC dicmokine receptor 1. 
TTieongoingsearchfor the discovery of novel chemoWn^ 
will most certainly reveal novel candidates to test wim 



the existing chemokine-like orphan receptors and any 
additional genes encoding chemokine receptors. 

With oGPCR DNAs in hand and with nearly all known 
ligands assigned, the task now is to use oGPCR DNAs to 
discover novel l^ands^*. The strategy employed is to ex- 
press the oGPCR DNA in a cell and apply tissue extracts 
until a response is observed. The agonist ligand is then 
purified, syndiesi2«daiui re-tested. This approachhas been 
most successful in identifying neuropeptides. Peptide 
ligands often exhibit high-affinity interactiorw with their 
receptors, which erwbles detection at low concentrations 
and die development of radioligand binding assays. The 
first success at orphan ligand identification involved a 
GPCR with sequence identity to the opioid receptors. The 
natural ligand was identiSedby two researdi groups using 
brain extracts^^ and die peptide discovered was 17 amino 
adds in length, named either oiphanin FQ or nodceptin. 
The peptide contains the tetrapeptide FCGF, which is 
similar to the motif YGGF of the opioid peptides. Arwdier 
successful strategy used rat brain fractions that were 
appfied to cells arid Ca^+ moWlizatidrV nieasurcd; this suc- 
ceeded in identifying a novel brain peptide. This peptide 
and a related peptide (from die same precursor protein) 
bound to two related oGPCRs and these peptides, whidi 
are found in tiie hypothalamus, function in appetite regu- 
lation and satiety control and dius were named orexins^^ 
(also known as hypooetins^^). fii a similar series f ex- 
periments, Bfinumaefof,^ measured aradudonate release , 
from CHO cells Iransfected widi the GPRIO (Ref. 28) to 
identify a novel brain peptide with prolactin-rdeasing 
properties at the anterior pituitary. This group has also 
identified anotfier novel peptide, apeliri», as the ligand ^ 

for the receptor APJ (Ref. 30). . -^^Z^: 
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Ths dusive nature of certain labile natural agonists could 
be a significant hindrance to the discovery of oGPCR li- 
gands, as there is no reason to believe that the remaining 
oGFCR ligands will all prove to be peptides. An attempt 
to address this prcA)lein involves the use of combinatorial 
chemistry to generate large libraries of compounds to be 
tested as surrogate agonists. Although not the physiologi- 
cal solution to the problem, sudi compounds axe tools for 
probing the pharmacology of an oGPQL Recenttyi anin- 
teresting variation to diis approach.was reported Yeast ex- 
pressing the human formyl peptide receptor-like oGPOl, 
FPR2 (Ref. 31), was made dq)endent on stimulation of 
this receptor for growth in Mstidine-£ree medium and 
tiien transfected with a plasmid DNA libraiy designed to 
express random tridecapeptides. Yeast colonies that were 
no longer dependent on histidine were judged to have 
undergone autocrine stimulation and the responsible plas> 
mids recovered. The results yielded a set of six peptides, 
one of which elicited Ca^* mobilization in HEK293 cells 
transfected with the FPR2 plasmid, 

Ligand-sceening assays 

There has been a concerted effort to make ligand identi; 
fication more efficient by developing cell-based assay sys- 
tems that have low endogenous GPCR background or 
report G-protein activation ev«its, or both/ in a robust, 
readily detected maimer. The existence of endogerujus 
GPCR signalling systems is important because ovei^ 
expression of one GPOl can elidt an exaggerated re- 
sponse via other, unrelated and previously unrecognized 
otdogenous GPCRs (Ref. 32), and dus could result in 
false positives. The aforementioned yeast expression sys- 
tem is attractive because of the absence of many endoge- 
nous GPCRs. In essence, it involves replacing the en- 
dogenous pheromone receptor with a mammalian GPCR 
and redirecting the pheromone pathway response from 
a mitogen-activated protein kinase type activation to a 
biosynflietic circuit, thus allowing the synthesis of his- 
tidine. In this case, agonist stimulation allows growth on 
histidine-free medium. Potential drawbacks of the yeast 
expression system are the difficulties in expressing some 
GPCRs achieving effective receptor-G-protein coupling 
and ligand bindiig to yeast cell wall components. 

Another assay system, which uses mammalian cdls, 
takes advantage of the relatively hig^ expression levels 
achieved following transfection of oGPCRDNAs so that 
the endogenous, low-level xecqjtors donotinterfere.'IWs 
system uses the translocation of ^-arrestin to receptor sites 
on d^e plasma memlTtane after agonist-mediated receptor 
activation. Barak et al have shown, using a p-arTestin-2/ 
green fluorescent protein (Pan2-CFP) fusion protein and 
confocal microscopy, tfiat on agonist stimulation of the 
Pj-adrenoceptor, farrZ-GFP translocates to the plasma 
membrane; and that this interaction can be enhanced by 
co-expression of G-protein-coupled receptor kinase 2 (Ref. 
33). This group also showed flaat similar responses are 
observed with other receptors raupled to different 
Gproteins, which suggests that the cellular visualization 



of the agonist-mediated translocation of ParT2-GFP coi 
provide a widely applicable metiiod for detecting 1 
activation of GPCRs. 

A system that is useful in measuring GPCR-medial 
activation of Ga^ Ga^^ ^ ^ based on pigment d 
persion or aggregation in cultured Xenopu$ laevis melai 
phores^. Increases in cAMP (Ga,-<:oupled receptors) 
activation of protein kinase C (Ga^) lead to pigment d 
persion causing darkerung of the cells, while decreases 
cAMP (Gofyo) lead to pigment aggregation near the r 
deus and make the ceU^ appear clear^. These cole 
changes are detected readily, however these cells hav 
substantial complement of endogenous GPCRs, wh: 
could confound the results. Overexpression of recept( 
in melanophoies results in changes in the 'basal' signalU 
and promotes either the dear or the dark cell colour, tt. 
predicting either Gojy ^ signalling or Ga^ or Ga^ path wa 

A simpler approach to detecting the activation of m 
tiple types of G proteins uses Gal6 as a universal adap 
G protein tiiat can fuimel the slgnal-transduction madi 
ery down a conunon pathway, such that a single secor 
messenger response (Ca^^ mobilization) can be measur 
for a given receptor^. Heterologous expression of Go 
allows the coupling of a wide range of GPCRs to phospl 
lipase activity, and tiience to Ca^^ mobilizatiort For exa 
pie, the Pj-adrenoceptor normally couples only to G 
but when the (J2-adrenoceptor and Gal6 are transien 
co-expressed in C(X7 cells agonist-dependent stin 
lation results in inositol phosphate (IP) productioi 
Receptors linked to Ga, (e.g. dopamine Dl, vasopressin 
aiui adenosine A2A receptors) or pertussis-toxin-sensit 
Goq (e.g. muscarinic acetyldioline Mj, 5-HT^ form 
pqjtide FPRl and 8-opioid receptors), when oo-transfed 
wifhGalS, also caused concentration-dependent; agoni 
mediated IP generation^. Other receptors (e.g. thro 
boxane A2 and vasopressin V^) that routinely couple to C 
and Gall to stimulate IP generation were also shown 
couple effectively to Gal5 and Gal6 (Ref. 38). Ho weM 
this coupling is not universal, as the chemokirw recepi 
CCRl, that effectively couples to G04 and Ga,, failed 
couple to Gal6 (Ref. 39). 

Other considerations 

Recently, new complexities have been added to 
genera! approach to studyir\g orphan GPCRs. For instar 
tfie oGFOl calcitonin receptor-like receptor, has bi 
doned«. The expression of tius receptor was consist 
with tirie expression pattern of a caldtonin gene-rela 
p^tide (CGRP). The efficient binding of CGRP or amy 
or both, to tfiis receptor required the co-expression < 
cofactor protein called receptor activity modifying prol 
l(RAMPl)*i. 

Studies have shown that heterodimerization of i 
GPCR subunits are required for the formation of a fu 
tional GABAg receptor^-**. The apparent requirement 
two different gene products to create a GPCR signal! 
entity indicates that the characterization of some oGPi 
might be more complex, perhaps indicating that fimctu 
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a5sa3rs should begin to indude co-expression of telated 
oGPCRs. 

In principle, the elimination of a GPCR gene from the 
gennline and testing- the resulting knockout mice for 
some change might provide dues to GPCR function, if 
not ligand identity. For exampte when the mouse BLRl 
orphan receptor was disrupted, it yielded mioe with 
abnormal primary follides and germinal centres of the 
spleen and Peyei's patches, reflecting the ii\ability of 
B lymphocytes to migrate into B-ceU areas*^. A novel 
peptide that binds and acHvates BRL-1 was recently 
discovered from the EST database^«. 

In view of the number of novel GPCRs Aat have been 
doned and are continuing to be discovered, it is expected 
that many endogenous Ugands will be discovered. Un- 
questiorxably, this will result in an increase in the knowl- 
edge of the diversity in btercellulax signalling media- 
nisms and should lead to novd insights into complex or 
poorly understood human disorders; it will also expand 
the boundari^ of pharmacology. In condusion, the dis- 
covery of the endogenous ligands will help determine the 
predse physiological role for each oGPQL As the func- 
tions of these novel receptors are uncovered, they could 
become targets for the development of new ph^mriaco- 
logical therapies for diseases not previously considered 
amenable to pharmacological drerapy. 
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ABSTRACT Pairwise sequence comparison methods have 
been assessed using proteins whose relationships are known 
reliably from their structures and functions, as described In 
the SCOP database [Murzin, A. G.> Brenner, S. Hubbard, T. 
& Chothia C. (1995) /. MoL BioL 247, 536-5401. The evalua- 
tion tested the programs blast [Altschul, S. R, Gish, 
Miller, W., Myers, E, W. & Lipman, D. J. (1990),/. MoL BioL 
215, 403-410], WU-BLAST2 [Altschul, S, F. & Gish, (1996) 
Methods EnzymoL 266, 460-480], FASTA [Pearson, W. R* & 
Lipman, D. J. (19S8) Proc, NatL Acad, ScL USA 85, 2444-2448], 
and SSEARCH [Smith, T. F. & Waterman, M. S, (1981) J. MoL 
BioL 147, 195-197] and their scoring schemes. The error rate 
of all algorithms is greatly reduced by using statistical scores 
to evaluate matches rather than percentage identity or raw 
scores- The E-value statistical scores of SSEArch and fast a are 
reliable: the number of false positives found in our tests agrees 
well with the scores reported. However, the P- values reported 
by BLAST and wu-blast2 exaggerate significance by orders of 
magnitude, ssearch, fasta ktup = 1, and wu-blast2 perform 
best, and they are capable of detecting almost all relationships 
between proteins whose sequence identities are >30%. For 
more distantly related proteins, they do much less well; only 
one-half of the relationships between proteins with 20-30% 
identity are found. Because many homologs have low sequence 
similarity, most distant relationships cannot be detected by 
any pairwise comparison method; however, those which are 
identified may be used with confidence. 



Sequence database searching plays a role in virtually every 
branch of molecular biology and is crucial for interpreting the 
sequences issuing forth from genome projects. Given the 
method's central role, it is surprising that overall and relative 
capabilities of different procedures are largely unknown. It is 
difficult to verify algorithms on sample data because this 
requires large data sets of proteins whose evolutionary rela- 
tionships are known unambiguously and independently of the 
methods being evaluated. However, nearly all known ho- 
mologs have been identified by sequence analysis (the method 
to be tested). Also, it is generally very difficult to know, in the 
absence of structural data, whether two proteins that lack clear 
sequence similarity are unrelated. This has meant that al- 
though previous evaluations have helped improve sequence 
comparison, they have suffered from insufficient, imperfectly 
characterized, or artificial test data. Assessment also has been 
problematic because high quality database sequence searching 
attempts to have both sensitivity (detection of homologs) and 
specificity (rejection of unrelated proteins); however, these 
complementary goals are linked such that increasing one 
causes the other to be reduced. 



The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked "advertisement" in 
accordance with 18 U.S.C §1734 solely to indicate this fact. 

© 1998 by The National Academy of Sciences 0027-8424/98/956073-6$2.00/0 
PNAS is available online at hup://www.pnas.org. 



Sequence comparison methodologies have evolved rapidly, 
so no previously published tests has evaluated modern versions 
of programs commonly used. For example, parameters in 
blast (1) have changed, and wu-blast2 (2)— -which produces 
gapped alignments— has become available, The latest venion 
of pasta (3) previously tested was L6, but the current release 
(version 3.0) provides fundamentally different results in the 
form of statistical scoring. 

The previous reports also have left gaps in our knowledge. 
For example, there has been no published assessment of 
thresholds for scoring schemes more sophisticated than per- 
centage identity. Thus, the widely discussed statistical scoring 
measures have never actually been evaluated on large data- 
bases of real proteins. Moreover, the different scoring schemes 
commonly in use have not been compared. 

Beyond these issues, there is a more fundamental question: 
in an absolute sense, how well does pairwise sequence com- 
parison work? That is, what fraction of homologous proteins 
can be detected using modern database searching methods? 

In this work, we attempt to answer these questions and to 
overcome both of the fundamental difficulties that have hin- 
dered assessment of sequence comparison methodologies. 
First, we use the set of distant evolutionary relationships in the 
scop: Structural Classification of Proteins database (4), which 
is derived from structural and functional characteristics (5). 
The SCOP database provides a uniquely reliable set of ho- 
mologs, which are known independently of sequence compar- 
ison. Second, we use an assessment method that jointly mea- 
sures both sensitivity and specificity. This method allows 
straightforward comparison of different sequence searching 
procedures. Further, it can be used to aid interpretation of real 
database searches and thus provide optimal and reliable 
results. 

Previous Assessments of Sequence Comparison, Several 
previous studies have examined the relative performance of 
different sequence comparison methods. The most encom- 
passing analyses have been by Pearson (6, 7), who compared 
the three most commonly used programs. Of these, the Smith- 
Waterman algorithm (8) implemented in ssearch (3) is the 
oldest and slowest but the most rigorous. Modern heuristics 
have provided blast (1) the speed and convenience to make 
it the most popular program. Intermediate between these two 
is pasta (3), which may be run in two modes offering either 
greater speed (ktup = 2) or greater effectiveness (ktup = 1). 
Pearson also considered different parameters for each of these 
programs. 

To test the methods, Pearson selected two representative 
proteins from each of 67 protein superfamilies defined by the 
PIR database (9). Each was used as a query to search the 
database, and the matched proteins were marked as being 
homologous or unrelated according to their membership of PiR 



Abbreviation: EPQ, errors per query, 
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superfamilies. Pearson found that modern matrices and "in- 
scaling" of raw scores improve results considerably. He also 
reported that the rigorous Smith-Waterman algorithm worked 
slightly better than fasta, which was in turn more effective 
than BLAST. 

Very large scale analyses of matrices have been performed 
(10), and Henikoff and Henikoff (11) also evaluated the 
effectiveness of blast and fasta. Their test with blast 
considered the ability to detect homo logs above a predeter- 
mined score but had no penalty for methods which also 
reported large numbers of spurious matches. The Henikoffs 
searched the swiss-prot database (12) and used prospte (13) 
to define homologous families. Their results showed that the 
BLOSUM62 matrix (14) performed markedly better than the 
extrapolated PAM-series matrices (15), which previously had 
been popular. 

A crucial aspect of any assessment is the data that are used 
to test the ability of the program to find homologs. But in 
Pearson's and the Henikoffe' evaluations of sequence com- 
parison, the correct results were effectively unknown. This is 
because the superfamilies in pir and prosfte are principally 
created by using the same sequence comparison methods 
which are being evaluated, Interdependcncy of data and 
methods creates a "chicken and egg" problem, and means for 
example, that new methods would be penalized for correctly 
identifying homologs missed by older programs. For instance, 
immunoglobulin variable and constant domains are clearly 
homologous, but pir places them in different superfamilies. 
The problem is widespread: each superfamily in PIR 48.00 with 
a structural homolog is itself homologous to an average of 1.6 
other PIR superfamilies (16), 

To surmount these sorts of difficulties, Sander and Schnei- 
der (17) used protein structures to evaluate sequence com- 
parison. Rather than comparing different sequence compari- 
son algorithms, their work focused on determining a length- 
dependent threshold of percentage identity, above which all 
proteins would be of similar structure. A result of this analysis 
was the HSSP equation; it states that proteins with 25% identity 
over 80 residues will have similar structures, whereas shorter 
alignments require higher identity. (Other studies also have 
used structures (18-20), but these focused on a small number 
of model proteins and were principally oriented toward eval- 
uating alignment accuracy rather than homology detection.) 

A general solution to the problem of scoring comes from 
statistical measures (i.e., E-values and P-values) based on the 
extreme value distribution (21).- Extreme value scoring was 
implemented analytically in the blast program using the 
Karlin and Altschul statistics (22, 23) and empirical ap- 
proaches have been recently added to fasta and ssearch. In 
addition to being heralded as a reliable means of recognizing 
significantly similar proteins (24, 25), the mathematical trac- 
tability of statistical scores "is a crucial feature of the blast 
algorithm" (1). The validity of this scoring procedure has been 
tested analytically and empirically (see ref. 2 and references in 
ref. 24). However, all large empirical tests used random 
sequences that may lack the subtle structure found within 
biological sequences (26, 27) and obviously do not contain any 
real homologs. Thus, although many researchers have sug- 
gested that statistical scores be used to rank matches (24, 25, 
28), there have been no large rigorous experiments on biolog- 
ical data to determine the degree to which such rankings are 
superior. 

A Database for Testing Homology Detection. Since the 
discovery that the structures of hemoglobin and myoglobin are 
very similar though their sequences are not (29), it has been 
apparent that comparing structures is a more powerful (if less 
convenient) way to recognize distant evolutionary relation- 
ships than comparing sequences. If two proteins show a high 
degree of similarity in their structural details and function, it 
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is very probable that they have an evolutionary relationship 
though their sequence similarity may be low. 

The recent growth of protein structure information com- 
bined with the comprehensive evolutionary classification in 
the SCOP database (4, 5) have allowed us to overcome previous 
limitations. With these data, we can evaluate the performance 
of sequence comparison methods on real protein sequences 
whose relationships are known confidently. The scop database 
uses structural information to recognize distant homologs, the 
large majority of which can be determined unambiguously. 
These superfamilies, such as the globins or the immunoglobu- 
lins, would be recognized as related by the vast majority of the 
biological community despite the lack of high sequence sim- 
ilarity. 

From scop, we extracted the sequences of domains of 
proteins in the Protein Data Bank (pdb) (30) and created two 
databases. One (pdb90D-b) has domains, which were all <90% 
identical to any other, whereas (pdB40D-b) had those <40% 
identical. The databases were created by first sorting all 
protein domains in scop by their quality and making a list. The 
highest quality domain was selected for inclusion in the 
database and removed from the list. Also removed from the list 
(and discarded) were all other domains above the threshold 
level of identity to the selected domain. This process was 
repeated until the list was empty. The pdb40D-b database 
contains 1,323 domains, which have 9,044 ordered pairs of 
distant relationships, or ^0.5% of the total 1,749,006 ordered 
pairs. In pdb90D-b, the 2,079 domains have 53,988 relation- 
ships, representing 1.2% of all pairs. Low complexity regions 
of sequence can achieve spurious high scores, so these were 
masked in both databases by processing with the seg program 
(27) using recommended parameters: 12 1.8 2.0. The databases 
used in this paper are available from http://sss.stanford.edu/ 
sss/, and databases derived from the current version of SCOP 
may be found at http://scop.mrc-lmb.cam.ac.uk/scop/. 

Analyses from both databases were generally consistent, but 
PDB40D-B focuses on distantly related proteins and reduces the 
heavy overrepresentation in the PDB of a small number of 
families (31, 32), whereas pdb90D-b (with more sequences) 
improves evaluations of statistics. Except where noted other- 
wise, the distant homolog results here are from PDB40D-B. 
Although the precise numbers reported here are specific to.the 
structural domain databases used, we expect the trends to, be 
general. 

Assessment Data and Procediu-e. Our assessment of se- 
quence comparison may be divided into four different major 
categories of tests. First, using just a single sequence compar- 
ison algorithm at a time, we evaluated the effectiveness of 
different scoring schemes. Second, we assessed the reliability 
of scoring procedures, including an evaluation of the validity 
of statistical scoring. Third, we compared sequence compari- 
son algorithms (using the optimal scoring scheme) to deter- 
mine their relative performance. Fourth, we examined the 
distribution of homologs and considered the power of pairwise 
sequence comparison to recognize them. All of the analyses 
used the databases of structurally identified homologs and a 
new assessment criterion. 

The analyses tested blast (1), version 1.4.9MP, and wu- 
BLAST2 (2), version 2.0al3MP. Also assessed was the FASTA 
package, version 3.0t76 (3), which provided fasta and the 
SSEARCH implementation of Smith-Waterman (8). For 
SSEARCH and FASTA, wc used BLOSUM45 with gap penalties 
-12/-1 (7, 16). The default parameters and matrix (blo- 
SUM62) were used for blast and wu-bi^T2, 

The "Coverage Vs. Error" Plot To test a particular protocol 
(comprising a program and scoring scheme), each sequence 
from the database was used as a query to search the database. 
This yielded ordered pairs of query and target sequences with 
associated scores, which were sorted, on the basis of their 
scores, from best to worst. ThcMdeal method would have 
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Fig. 1. Coverage vs. error plots of different scoring schemes for ssearch Smith-Waterman. (A) Analysis of pdb40d-b database, (B) Analysis 
of PDB90D-B database. All of the proteins in the database were compared with each other using the ssearch program. The results of this single 
set of comparisons were considered using five different scoring schemes and assessed. The graphs show the coverage and errors per query (EPQ) 
for statistical scores, raw scores, and three measures using percentage identity. In the coverage vs. error plot, the ;c axis indicates the fraction of 
all horaologs in the database (known from structure) which have been detected. Precisely, it is the number of detected pairs of proteins with the 
same fold divided by the total number of pairs from a common superfamily. pdB40d-b contains a total of 9,044 homologs, so a score of 10% indicates 
identification of 904 relationships. The y axis reports the number of EPQ, Because there are 1,323 queries made in the pdb4od-b all-vs.-all 
comparison, 13 errors corresponds to 0.01, or 1% EPQ. They axis is presented on a log scale to show results over the widely varying degrees of 
accuracy which may be desired. The scores that correspond to the levels of EPQ and coverage are shown in Fig. 4 and Table 1. The graph 
demonstrates the trade-off between sensitivity and selectivity. As more homologs are found (moving to the right), more errors are made (moving 
up). The ideal method would be in the lower right comer of the graph, which corresponds to identifying many evolutionary relationships without 
selecting unrelated proteins. Three measures of percentage identity are plotted. Percentage identity within alignment is the degree of identity within 
the aligned region of the proteins, without consideration of the alignment length. Percentage identity within both is the number of identical residues 
in the aligned region as a percentage of the average length of the query and target proteins. The HSSP equation (17) is H = 290.15/"**'^^^ where 
/ is length for 10 < / < 80; H > 100 for / < 10; H 24.7 for / > 80. The percentage identity iissp-adjusted score is the percent idenUty within 
the alignment minus H. Smith-Waterman raw scores and E-values were taicen directly from the sequence comparison program. 



perfect separation, with all of the honnologs at the top of the 
list and unrelated proteins below. In practice, perfect separa- 
tion is impossible to achieve so instead one is interested in 
drawing a threshold above which there are the largest number 
of related pairs of sequences consistent with an acceptable 
error rate. 

Our procedure involved measuring the coverage and error 
for every threshold. Coverage was defined as the fraction of 
structurally determined homologs that have scores above the 
selected threshold; this reflects the sensitivity of a method. 
Errors per query (EPQ), an indicator of selectivity, is the 
number of nonhomologous pairs above the threshold divided 
by the number of queries. Graphs of these data, called 
coverage vs. error plots, were devised to understand how 



protocols compare at different levels of accuracy. These 
graphs share effectively all of the beneficial features of Re- 
ciever Operating Characteristic (ROC) plots (33, 34) but 
better represent the high degrees of accuracy required in 
sequence comparison and the huge background of non ho- 
mologs. 

This assessment procedure is directly relevant to practical 
sequence database searching, for it provides precisely the 
information necessary to perform a reliable sequence database 
search. The EPQ measure places a premium on score consis- 
tency; that is, it requires scores to be comparable for different 
queries. Consistency is an aspect which has been largely 

Percent Identity of Unrelated Protetne (PDB90D-B) 
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Hemoglobin p-chain (Ihdsb) Cellulase E2 <1lmlJ 

LLWY PWTORFFOHFXa<LSSAa*VMNNPKVKAHGKRVLDAfTQOLKH 
OCGWHSSGGA PSHSAY-R3HIDEFAAGLKH 



lhcl«b GKVOVDWOAOALGR- 
Itml, GOVDALMSAAQAAGKIPILWYKAPGR- 




F[G. 2. Unrelated proteins with high percentage identity. Hemo- 
globin l^-chain (pdb code Ihds chain b. ref. 38, Left) and cellulase E2 
(PDu code Itml, ref. 39, Right) have 39% identity over 64 residues, a 
level which is often believed to be indicative of homology. Despite this 
high degree of identity, their structures strongly suggest that these 
proteins are not related. Appropriately, neither the raw alignment 
score of 85 nor the E-value of 1.3 is significant Proteins rendered by 
RASMOL (40). 



100 

Alignment length 

Fig. 3. Length and percentage identity of alignments of unrelated 
proteins in pdb90D-b: Each pair of nonhomologous proteins found with 
SSEARCH is plotted as a point whose position indicates the length and 
the percentage identity within the alignment Because alignment 
length and percentage identity are quantized^ many pairs of proteins 
may have exactly the same alignment length and percentage identity. 
The line shows the irssp threshold (though it is intended to be ajpplied 
with a different matrix and parameters). c 



5- 9-03; 13: 12 ; PHARMAC I A 



;269 833 2316 



# 4/ 



6076 Biochemistry: Brenner et al 



Proc. Natl, Acad. Sci. USA 95 (1998) 



Reliability of Statistical Scores (PDB90E>-B) 
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Fio. 4. Reliability of statistical scores in pdb90D-b: Each line shows 
tiie relationsliip between reported statistical score and actual error 
rate for a different program. E-values are reported for ssearch and 
FASTA, whereas P-values are shown for blast and wu-blast2. If the 
scoring were perfect* then the number of errors per query and the 
E-values would be the same, as indicated by the upper bold line. 
(P-values should be the same as EPQ for small numbers, and diverges 
at higher values, as indicated by the lower bold line.) E-values from 
ssearch and fasta are shown to have good agreement with EPQ but 
underestimate the significance slightly, blast and wu-BtJVST2 are 
overconfident, with the degree of exaggeration dependent upon the 
score. The results for pdb40i>b were similar to those for pdbwd-b 
despite the difference in number of homologs detected. This graph 
could be used to roughly calibrate the reliability of a given statistical 
score. 

ignored in previous tests but is essential for the straightforward 
or automatic interpretation of sequence comparison results. 
Further, it provides a clear indication of the confidence that 
should be ascribed to each match. Indeed, the EPQ measure 
should approximate the expectation value reported by data- 
base searching programs, if the programs* estimates are accu- 
rate. 

The Performance of Scoring Schemes. All of the programs 
tested could provide three fundamental types of scores. The 
first score is the percentage identity, which may be computed 
in several ways based on either the length of the alignment or 
the lengths of the sequences. The second is a "raw" or 
"Smith-Waterman" score, which is the measure optimized by 
the Smithr-Waterman algorithm and is computed by summing 
the substitution matrbc scores for each position in the align- 
ment and subtracting gap penalties. In blast, a measure 

Sequence Comparison Algorithms (PD&40D-B) 
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related to this score is scaled into bits. Third is a statistical 
score based on the extreme value distribution. These results 
are summarized in Fig. 1. 

Sequence Identity. Though it has been long established that 
percentage identity is a poor measure (35), there is a common 
rule-of-thumb stating that 30% identity signifies homology. 
Moreover, publications have indicated that 25% identity can 
be used as a threshold (17, 36). We find that these thresholds, 
originally derived years ago, are not supported by present 
results. As databases have grown, so have the possibilities for 
chance alignments with high identity; thus, the reported cutoffs 
lead to frequent errors. Fig. 2 shows one of the many pairs of 
proteins with very different structures that nonetheless have 
high levels of identity over considerable aligned regions. 
Despite the high identity, the raw and the statistical scores for 
such incorrect matches are typically not significant. The prin- 
cipal reasons percentage identity does so poorly seem to be 
that it ignores information about gaps and about the conser- 
vative or radical nature of residue substitutions. 

From the pdb90D-B analysis in Fig. 3, we learn that 30% 
identity is a reliable threshold for this database only for 
sequence alignments of at least 150 residues. Because one 
unrelated pair of proteins has 43.5% identity over 62 residues, 
it is probably necessary for alignments to be at least 70 residues 
in length before 40% is a reasonable threshold, for a database 
of this particular size and composition. 

At a given reliability, scores based on percentage identity 
detect just a fraction of the distant homologs found by 
statistical scoring. If one measures the percentage identity in 
the aligned regions without consideration of alignment length, 
then a negligible number of distant homologs are detected. 
Use of the HSSP equation improves the value of percentage 
identity, but even this measure can find only 4% of all known 
homologs at 1% EPQ, In short, percentage identity discards 
most of the information measured in a sequence comparison. 

Raw Scores. Smith-Waterman raw scores perform better 
than percentage identity (Fig. 1), but In-scaling (7) provided no 
notable benefit in our analysis. It is necessary to be very precise 
when using either raw or bit scores because a 20% change in 
cutoff score could yield a tenfold difference in EPQ. However, 
it is difficult to choose appropriate thresholds because the 
reliability of a bit score depends on the lengths of the proteins 
matched and the size of the database. Raw score thresholds 
also are affected by matrix and gap parameters. 

Statistical Scores. Statistical scores were introduced partly 
to overcome the problems that arise from raw scores. This 
scoring scheme provides the best discrimination between 
homologous proteins and those which are unrelated. Most 

Sequence Comparison Algorithms (PDB90D-B) 



0.12 



0.14 



0.16 0.18 
Coverage 



0.2 



0.22 




0.35 
Coverage 



0.4S 



Fig. 5, Coverage vs. error plots of different sequence comparison methods: Five different sequence comparison methods are evaluated, each 
using statistical scores (E- or P-values). {A ) pdimoo-b database. In this analysis, the best method is the slow ssi£arch, which finds 18% of relationships 
at 1% EPQ. fasta kiup = 1 and wu-blash are almost as good. {B) pdbocd-b database. The quick wu-BLAS'i^ program provides the best coverage 
at 1% EPQ on this database, although at higher levels of error it becomes slightly worse than fasta ktup = 1 and ssiiARai. 
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likely, its power can be attributed to its incorporation of more 
information than any other measure; it takes account of the 
full substitution and gap data (like raw scores) but also has 
details about the sequence lengths and composition and is 
scaled appropriately. 

We find that statistical scores are not only powerful, but also 
easy to interpret, ssearch and fasta show close agreement 
between statistical scores and actual number of errors per 
query (Fig. 4). The expectation value score gives a good, 
slightly conservative estimate of the chances of the two se- 
quences being found at random in a given query. Thus, an 
E-value of 0.01 indicates that roughly one pair of nonhomologs 
of this similarity should be found in every 100 different queries. 
Neither raw scores nor percentage identity can be interpreted 
in this way, and these results validate the suitability of the 
extreme value distribution for describing the scores from a 
database search. 

The P-values from blast also should be directly interpret- 
able but were found to overstate significance by more than two 
orders of magnitude for 1% EPQ for this database. Nonethe- 
less, these results strongly suggest that the analytic theory is 
fundamentally appropriate. WU-BLAST2 scores were more re- 
liable than those from blast, but also exaggerate expected 
confidence by more than an order of magnitude at 1% EPQ. 

Overall Detection of Homologs and Comparison of Algo- 
rithms. The results in Fig. SA and Table 1 show that pairwise 
sequence comparison is capable of identifying only a small 
fraction of the homologous pairs of sequences in pdb40D-b. 
Even SSEARCH with E-values, the best protocol tested, could 
find only 18% of all relationships at a 1% EPQ. blast, which 
identifies 15%, was the worst performer, whereas fasta 
ktup = 1 is nearly as effective as ssearch. fasta ktup = 2 and 
WU-BLAST2 are intermediate in their ability to detect ho- 
mologs. Ckjmparison of different algorithms indicates that 
those capable of identifying more homologs are generally 
slower, SSEARCH is 25 times slower than blast and 6.5 times 
slower than fasta ktup = 1. wu-blast2 is slightly faster than 
fasta ktup = 2, but the latter has more interpretable scores. 

In PDB90D-B, where there are many close relationships, the 
best method can identify only 38% of structurally known 
homologs (Fig. The method which finds that many 
relationships is WU-BLAST2. Consequently, we infer that the 
differences between fasta kup = 1, ssearch, and wu-blast2 
programs are unlikely to be significant when compared with 
variation in database composition and scoring reliability. 

Fig. 6 helps to explain why most distant homologs cannot be 
found by sequence comparison: a great many such relation- 
ships have no more sequence identity than would be expected 
by chance, ssearch with E-values can recognize >90% of the 
homologous pairs with 30-40% identity. In this region, there 
are 30 pairs of homologous proteins that do not have signif- 
icant E-values, but 26 of these involve sequences with <50 
residues. Of sequences having 25-30% identity, 75% are 
identified by ssearch E-values. However, although the num- 
ber of homologs grows at lower levels of identity, the detection 
falls off sharply: only 40% of homologs with 20-25% identity 
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Fig. 6. Distribution and detection of homologs in pdb40d-b. Bars 
show the distribution of homologous pairs pdb40d-b according to their 
identity (using the measure of identity in both). Filled regions indicate 
the number of these pairs found by the best database searching method 
(ssearch with E-values) at 1% EPQ. The pdb4od-b database contains 
proteins with <40% identity, and as shown on this graph, most 
structurally identified homologs in the database have diverged ex- 
tremely far in sequence and have <20% identity. Note that the 
alignments maybe inaccurate, especially at low levels of identity. Filled 
regions show that ssearch can identify most relationships that have 
25% or more identity, but its detection wanes sharply below 25%. 
Consequently, the great sequence divergence of most structurally 
identified evolutionary relationships effectively defeats the ability of 
pariwise sequence comparison to detect them. 

are detected and only 10% of those with 15-20% can be found. 
These results show that statistical scores can find related 
proteins whose identity is remarkably low; however, the power 
of the method is restricted by the great divergence of many 
protein sequences. 

After completion of this work, a new version of pairwise 
BLAST was released: blastgp (37). It supports gapped align- 
ments, like WU-BLAST2, and dispenses with sum statistics. Our 
initial tests on blastgp using default parameters show that its 
E-values are reliable and that its overall detection of homologs 
was substantially better than that of ungapped blast, but not 
quite equal to that of wu-blast2. 

CONCLUSION 

The general consensus amongst experts (see refs. 7, 24, 25, 27 
and references therein) suggests that the most effective se- 
quence searches are made by {i) using a large current database 
in which the protein sequences have been complexity masked 
and (it) using statistical scores to interpret the results. Our 
experiments fully support this view. 

Our results also suggest two further points. First, the E-val- 
ues reported by fasta and ssearch give fairly accurate 
estimates of the significance of each match, but the P-values 
provided by blast and wu-blast2 underestimate the true 



Table L Summary of sequence comparison methods with pdb40d-b 



Method 


Relative Time* 


1% EPQ Cutoff 


Coverage at 1% EPQ 


SSEARCH % identity: within alignment 


25.5 


>70% 


<0.1 


SSEARCH % identity: within both 


25.5 


34% 


3.0 


SSEARQI % identity: HSSP-scaled 


25.5 


35% (HSSP + 9.8) 


4.0 


SSEARCH Smith- Waterman raw scores 


25.5 


142 


10.5 


ssEARai E-values 


25,5 


0.03 


18.4 


FASTA ktup = 1 E-values 


3-9 


0.03 


17.9 


FASTA ktup = 2 E-values 


1,4 


0.03 


16.7 


wu-BLAsn P-values 


LI 


0.003 


17.5 


BLAST P-values 


LO 


0.00016 


14,8 


*Times are from large database searches with genome proteins. 
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extent of enors. Second, ssearch, wu-blast2, and fasta 
letup = 1 perform best, though blast and fasta ktup = 2 
detect most of the relationships found by the best procedures 
and are appropriate for rapid initial searches. 

The homologous proteins that are found by sequence com- 
parison can be distinguished with high reliability from the huge 
number of unrelated pairs. However, even the best database 
searching procedures tested fail to find the large majority of 
distant evolutionary relationships at an acceptable error rate. 
Thus, if the procedures assessed here fail to find a reliable 
match, it does not imply that the sequence is unique; rather, it 
indicates that any relatives it might have are distant ones.** 



**Additional and updated information about this work, including 
supplementary figures, may be found at http://sss.stanford.edu/sss/. 
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